From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 348E9CD5BB1 for ; Sat, 23 May 2026 06:06:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5609810E2DB; Sat, 23 May 2026 06:06:38 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="qnA7Em3p"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 008AB10E2DB for ; Sat, 23 May 2026 06:06:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1779516395; bh=oAI3SHc+aB3CaI0JG4n4ZyhQS2wx8sZEaBLdes9LjZA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=qnA7Em3pVsZLgp/KCFbMriLf6qtrQXoRbEGsMvcY509oNhRs6fr7IIHchIxxHgu82 zY8McjNKwfJMkvra0c91Tz3/AKk2GrTkulI9JtpEiGyYGiyE6jP6Wjps/vJMFXVg2e 623ZHh5Y7wlCSe700Rq+0ogfdmYbo6X2B2vLeus7MMx3761wPfEbEoPDjjBg7Kz2uu HvSmLKDOaLdyBFntEeWdZKeVTTca8rSsioepEV/GCMvw+WTLJ4G5QS0Aa/70orZW/D jVqQWdHKQUvrdX0rqoXEXoDl9n7J3fg5fI1piRilmXJvPfSzi/sjq68s3qQpAlRHlP ktfgyn4NebnhA== Received: from fedora (unknown [100.64.0.11]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id DF3E817E05B5; Sat, 23 May 2026 08:06:34 +0200 (CEST) Date: Sat, 23 May 2026 08:06:29 +0200 From: Boris Brezillon To: Tvrtko Ursulin Cc: dri-devel@lists.freedesktop.org, kernel-dev@igalia.com, Liviu Dudau , Steven Price Subject: Re: [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler Message-ID: <20260523080629.1ce22098@fedora> In-Reply-To: <7839ed28-77a8-42ff-8c9b-ac160e0f3a8f@igalia.com> References: <20260522113817.11745-1-tvrtko.ursulin@igalia.com> <20260522113817.11745-3-tvrtko.ursulin@igalia.com> <20260522174831.34983d27@fedora> <7839ed28-77a8-42ff-8c9b-ac160e0f3a8f@igalia.com> Organization: Collabora X-Mailer: Claws Mail 4.4.0 (GTK 3.24.52; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, 22 May 2026 17:25:18 +0100 Tvrtko Ursulin wrote: > On 22/05/2026 17:08, Boris Brezillon wrote: > > On Fri, 22 May 2026 12:38:17 +0100 > > Tvrtko Ursulin wrote: > > > >> Currently an unordered workqueue is used for the DRM scheduler which means > >> its concurrency is externally managed, and given there is one scheduler > >> instance per userspace queue, that means workqueue management logic is > >> within its rights to spawn many kernel threads to submit their respective > >> jobs. > >> > >> Problem there is that all run job callbacks are serialized on the device > >> global mutex, making the potential thread storm just causing lock > >> contention. > > > > Yeah, so initially we were not supposed to take the lock over the whole > > run_job() function. We should normally be able to queue things to the > > ring buffer, and only briefly take the lock to check if the context is > > still resident and kick the group scheduler if it's not. I agree that > > in practice it turned in a huge synchronization point. I guess we should > > consider turning that mutex into a rwsem that's taken in write mode in > > the tick path, and read-mode elsewhere. > > There is some software state modified too, so I am not sure how easy or > hard it would be to make run job only hold the read lock? Yeah, it's probably not as easy as it sounds. > > >> If we add a separate ordered workqueue for the DRM scheduler integration > >> we can avoid this problem, since the ordered property directly expresses > >> the nature of the submission backend implementation. > > > > I don't see alloc_ordered_workqueue() being used for the sched_wq > > workqueue, is that intended (according to this comment, you seem to > > want an ordered wq). > > Yeah, it seems I mistyped the wq allocation below. > > >> And considering the other user of this workqueue, the free job callback, > >> which is not globally serialized in this manner so could be thought to > >> potentially regress with this change, it should not be the case since > >> commit > >> a58f317c1ca0 ("drm/sched: Free all finished jobs at once") > >> made the DRM scheduler handle the cleanup of finished jobs more promptly. > >> > >> Signed-off-by: Tvrtko Ursulin > >> Cc: Boris Brezillon > >> Cc: Liviu Dudau > >> Cc: Steven Price > >> --- > >> drivers/gpu/drm/panthor/panthor_sched.c | 27 ++++++++++++++++--------- > >> 1 file changed, 17 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > >> index 2bee1c92fb9e..cc6b3e2b015a 100644 > >> --- a/drivers/gpu/drm/panthor/panthor_sched.c > >> +++ b/drivers/gpu/drm/panthor/panthor_sched.c > >> @@ -147,13 +147,11 @@ struct panthor_scheduler { > >> struct panthor_device *ptdev; > >> > >> /** > >> - * @wq: Workqueue used by our internal scheduler logic and > >> - * drm_gpu_scheduler. > >> + * @wq: Workqueue used by our internal scheduler logic. > >> * > >> * Used for the scheduler tick, group update or other kind of FW > >> * event processing that can't be handled in the threaded interrupt > >> - * path. Also passed to the drm_gpu_scheduler instances embedded > >> - * in panthor_queue. > >> + * path. > >> */ > >> struct workqueue_struct *wq; > >> > >> @@ -166,6 +164,14 @@ struct panthor_scheduler { > >> */ > >> struct workqueue_struct *heap_alloc_wq; > >> > >> + /** > >> + * @sched_wq: Workqueue used for the DRM scheduler. > >> + * > >> + * Workqueue used for drm_gpu_scheduler instances embedded in > >> + * panthor_queue. > >> + */ > >> + struct workqueue_struct *sched_wq; > >> + > >> /** @tick_work: Work executed on a scheduling tick. */ > >> struct delayed_work tick_work; > >> > >> @@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group, > >> { > >> struct drm_sched_init_args sched_args = { > >> .ops = &panthor_queue_sched_ops, > >> - .submit_wq = group->ptdev->scheduler->wq, > >> + .submit_wq = group->ptdev->scheduler->sched_wq, > >> /* > >> * The credit limit argument tells us the total number of > >> * instructions across all CS slots in the ringbuffer, with > >> @@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res) > >> if (sched->heap_alloc_wq) > >> destroy_workqueue(sched->heap_alloc_wq); > >> > >> + if (sched->sched_wq) > >> + destroy_workqueue(sched->sched_wq); > >> + > >> for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) { > >> drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio])); > >> drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio])); > >> @@ -4167,13 +4176,11 @@ int panthor_sched_init(struct panthor_device *ptdev) > >> * FW is smart enough to fall back on other methods if the kernel can't > >> * allocate memory, and fail the tiling job if none of these > >> * countermeasures worked. > >> - * > >> - * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the > >> - * system is running out of memory. > >> */ > >> sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0); > >> - sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); > >> - if (!sched->wq || !sched->heap_alloc_wq) { > >> + sched->wq = alloc_workqueue("panthor-csf-sched", WQ_UNBOUND, 0); > >> + sched->sched_wq = alloc_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM, 0); > > > > The other one also needs MEM_RECLAIM, because you need work items > > queued to sched->wq to run to guarantee forward progress, and you need > > to guarantee forward progress to reclaim GPU mem. > > Ack, I had a feeling that might be the case. > > I will respin next week or so. Or if you tell me the global lock can be > easily dropped from .run_job I can drop and forget about it. Wider > context is that I am experimenting with kthread_worker conversion and > trying to polish a > somewhat-broken-but-showing-great-latency-improvements branch. Yep, I know, your experimental branch is actually on my list of things to look at/test ;-). > For that > I can kind of take either one global worker, or one worker per client > route for the RFC, no big deal either way for the prototype. We probably want a worker per-prio (and possibly per-cpu), but certainly not one per-client, or you'll end up with the thread explosion that was addressed by the kthread -> workqueue transition. Also, I didn't look at your panthor changes in this branch yet, but if we're switching drm_sched to kthread workers, we probably want to transition most existing panthor works to kthread_work, because some FW events might need to processed for the GPU context to be unblocked, and if we keep queuing those to a regular workqueue, they will be lagging behing the HI_PRIO thread you have for HI_PRIO contexts.