From: "Danilo Krummrich" <dakr@kernel.org>
To: "Boris Brezillon" <boris.brezillon@collabora.com>
Cc: "Philipp Stanner" <phasta@mailbox.org>,
phasta@kernel.org, "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
kernel-dev@igalia.com, intel-xe@lists.freedesktop.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
"Christian König" <christian.koenig@amd.com>,
"Leo Liu" <Leo.Liu@amd.com>, "Maíra Canal" <mcanal@igalia.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Michal Koutný" <mkoutny@suse.com>,
"Michel Dänzer" <michel.daenzer@mailbox.org>,
"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
"Rob Clark" <robdclark@gmail.com>, "Tejun Heo" <tj@kernel.org>,
"Alexandre Courbot" <acourbot@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"John Hubbard" <jhubbard@nvidia.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Timur Tabi" <ttabi@nvidia.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Lucas De Marchi" <lucas.demarchi@intel.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Rob Herring" <robh@kernel.org>,
"Steven Price" <steven.price@arm.com>,
"Liviu Dudau" <liviu.dudau@arm.com>,
"Daniel Almeida" <daniel.almeida@collabora.com>,
"Alice Ryhl" <aliceryhl@google.com>,
"Boqun Feng" <boqunf@netflix.com>,
"Grégoire Péan" <gpean@netflix.com>,
"Simona Vetter" <simona@ffwll.ch>,
airlied@gmail.com
Subject: Re: [RFC v8 00/21] DRM scheduling cgroup controller
Date: Tue, 30 Sep 2025 12:58:29 +0200 [thread overview]
Message-ID: <DD62YFG2CJ36.1NFKRTR2ZKD6V@kernel.org> (raw)
In-Reply-To: <20250930121229.4f265e0c@fedora>
On Tue Sep 30, 2025 at 12:12 PM CEST, Boris Brezillon wrote:
> So, my take on that is that what we want ultimately is to have the
> functionality provided by drm_sched split into different
> components that can be used in isolation, or combined to provide
> advanced scheduling.
>
> JobQueue:
> - allows you to queue jobs with their deps
> - dequeues jobs once their deps are met
> Not too sure if we want a push or a pull model for the job dequeuing,
> but the idea is that once the job is dequeued, ownership is passed to
> the SW entity that dequeued it. Note that I intentionally didn't add
> the timeout handling here, because dequeueing a job doesn't necessarily
> mean it's started immediately. If you're dealing with HW queues, you
> might have to wait for a slot to become available. If you're dealing
> with something like Mali-CSF, where the amount of FW slots is limited,
> you want to wait for your execution context to be passed to the FW for
> scheduling, and the final situation is the full-fledged FW scheduling,
> where you want things to start as soon as you have space in your FW
> queue (AKA ring-buffer?).
>
> JobHWDispatcher: (not sure about the name, I'm bad at naming things)
> This object basically pulls ready-jobs from one or multiple JobQueues
> into its own queue, and wait for a HW slot to become available. If you
> go for the push model, the job gets pushed to the HW dispatcher queue
> and waits here until a HW slot becomes available.
> That's where timeouts should be handled, because the job only becomes
> active when it gets pushed to a HW slot. I guess if we want a
> resubmit mechanism, it would have to take place here, but give how
> tricky this has been, I'd be tempted to leave that to drivers, that is,
> let them requeue the non-faulty jobs directly to their
> JobHWDispatcher implementation after a reset.
>
> FWExecutionContextScheduler: (again, pick a different name if you want)
> This scheduler doesn't know about jobs, meaning there's a
> driver-specific entity that needs to dequeue jobs from the JobQueue
> and push those to the relevant ringbuffer. Once a FWExecutionContext
> has something to execute, it becomes a candidate for
> FWExecutionContextScheduler, which gets to decide which set of
> FWExecutionContext get a chance to be scheduled by the FW.
> That one is for Mali-CSF case I described above, and I'm not too sure
> we want it to be generic, at least not until we have another GPU driver
> needing the same kind of scheduling. Again, you want to defer the
> timeout handling to this component, because the timer should only
> start/resume when the FWExecutionContext gets scheduled, and it should
> be paused as soon as the context gets evicted.
This sounds pretty much like the existing design with the Panthor group
scheduler layered on top of it, no?
Though, one of the fundamental problems I'd like to get rid of is that job
ownership is transferred between two components with fundamentally different
lifetimes (entity and scheduler).
Instead, I think the new Jobqueue should always own and always dispatch jobs
directly and provide some "control API" to be instructed by an external
component (orchestrator) on top of it when and to which ring to dispatch jobs.
The group scheduling logic you need for some Mali GPUs can either be implemented
by hooks into this orchestrator or by a separate component that attaches to the
same control API of the Jobqueue.
> TLDR; I think the main problem we had with drm_sched is that it had
> this clear drm_sched_entity/drm_gpu_scheduler separation, but those two
> components where tightly tied together, with no way to use
> drm_sched_entity alone for instance, and this led to the weird
> lifetime/ownership issues that the rust effort made more apparent. If we
> get to design something new, I think we should try hard to get a clear
> isolation between each of these components so they can be used alone or
> combined, with a clear job ownership model.
This I agree with, but as explained above I'd go even one step further.
next prev parent reply other threads:[~2025-09-30 10:58 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 15:23 [RFC v8 00/21] DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 01/21] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 02/21] drm/sched: Add some more " Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 03/21] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 04/21] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 05/21] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 06/21] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 07/21] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 08/21] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 09/21] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 10/21] drm/sched: Break submission patterns with some randomness Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 11/21] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 12/21] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 13/21] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 14/21] cgroup/drm: Track DRM clients per cgroup Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 15/21] cgroup/drm: Add scheduling weight callback Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 16/21] cgroup/drm: Introduce weight based scheduling control Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 17/21] drm/sched: Add helper for tracking entities per client Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 18/21] drm/sched: Add helper for DRM cgroup controller weight notifications Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 19/21] drm/amdgpu: Register with the DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 20/21] drm/xe: Allow changing GuC scheduling priority Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 21/21] drm/xe: Register with the DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-04 12:08 ` Tvrtko Ursulin
2025-09-29 14:07 ` [RFC v8 00/21] " Danilo Krummrich
2025-09-30 9:00 ` Philipp Stanner
2025-09-30 9:28 ` DRM Jobqueue design (was "[RFC v8 00/21] DRM scheduling cgroup controller") Danilo Krummrich
2025-09-30 10:12 ` [RFC v8 00/21] DRM scheduling cgroup controller Boris Brezillon
2025-09-30 10:58 ` Danilo Krummrich [this message]
2025-09-30 11:57 ` Boris Brezillon
2025-10-07 14:44 ` Danilo Krummrich
2025-10-07 15:44 ` Boris Brezillon
2025-10-23 11:18 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DD62YFG2CJ36.1NFKRTR2ZKD6V@kernel.org \
--to=dakr@kernel.org \
--cc=Leo.Liu@amd.com \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=aliceryhl@google.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=boqunf@netflix.com \
--cc=boris.brezillon@collabora.com \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=daniel.almeida@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gpean@netflix.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=kernel-dev@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=mcanal@igalia.com \
--cc=michel.daenzer@mailbox.org \
--cc=mkoutny@suse.com \
--cc=phasta@kernel.org \
--cc=phasta@mailbox.org \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=robdclark@gmail.com \
--cc=robh@kernel.org \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tj@kernel.org \
--cc=ttabi@nvidia.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).