From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Danilo Krummrich" <dakr@kernel.org>
Cc: "Philipp Stanner" <phasta@mailbox.org>,
phasta@kernel.org, "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
kernel-dev@igalia.com, intel-xe@lists.freedesktop.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
"Christian König" <christian.koenig@amd.com>,
"Leo Liu" <Leo.Liu@amd.com>, "Maíra Canal" <mcanal@igalia.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Michal Koutný" <mkoutny@suse.com>,
"Michel Dänzer" <michel.daenzer@mailbox.org>,
"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
"Rob Clark" <robdclark@gmail.com>, "Tejun Heo" <tj@kernel.org>,
"Alexandre Courbot" <acourbot@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"John Hubbard" <jhubbard@nvidia.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Timur Tabi" <ttabi@nvidia.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Lucas De Marchi" <lucas.demarchi@intel.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Rob Herring" <robh@kernel.org>,
"Steven Price" <steven.price@arm.com>,
"Liviu Dudau" <liviu.dudau@arm.com>,
"Daniel Almeida" <daniel.almeida@collabora.com>,
"Alice Ryhl" <aliceryhl@google.com>,
"Boqun Feng" <boqunf@netflix.com>,
"Grégoire Péan" <gpean@netflix.com>,
"Simona Vetter" <simona@ffwll.ch>,
airlied@gmail.com
Subject: Re: [RFC v8 00/21] DRM scheduling cgroup controller
Date: Tue, 30 Sep 2025 13:57:36 +0200 [thread overview]
Message-ID: <20250930135736.02b69c65@fedora> (raw)
In-Reply-To: <DD62YFG2CJ36.1NFKRTR2ZKD6V@kernel.org>
On Tue, 30 Sep 2025 12:58:29 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Tue Sep 30, 2025 at 12:12 PM CEST, Boris Brezillon wrote:
> > So, my take on that is that what we want ultimately is to have the
> > functionality provided by drm_sched split into different
> > components that can be used in isolation, or combined to provide
> > advanced scheduling.
> >
> > JobQueue:
> > - allows you to queue jobs with their deps
> > - dequeues jobs once their deps are met
> > Not too sure if we want a push or a pull model for the job dequeuing,
> > but the idea is that once the job is dequeued, ownership is passed to
> > the SW entity that dequeued it. Note that I intentionally didn't add
> > the timeout handling here, because dequeueing a job doesn't necessarily
> > mean it's started immediately. If you're dealing with HW queues, you
> > might have to wait for a slot to become available. If you're dealing
> > with something like Mali-CSF, where the amount of FW slots is limited,
> > you want to wait for your execution context to be passed to the FW for
> > scheduling, and the final situation is the full-fledged FW scheduling,
> > where you want things to start as soon as you have space in your FW
> > queue (AKA ring-buffer?).
> >
> > JobHWDispatcher: (not sure about the name, I'm bad at naming things)
> > This object basically pulls ready-jobs from one or multiple JobQueues
> > into its own queue, and wait for a HW slot to become available. If you
> > go for the push model, the job gets pushed to the HW dispatcher queue
> > and waits here until a HW slot becomes available.
> > That's where timeouts should be handled, because the job only becomes
> > active when it gets pushed to a HW slot. I guess if we want a
> > resubmit mechanism, it would have to take place here, but give how
> > tricky this has been, I'd be tempted to leave that to drivers, that is,
> > let them requeue the non-faulty jobs directly to their
> > JobHWDispatcher implementation after a reset.
> >
> > FWExecutionContextScheduler: (again, pick a different name if you want)
> > This scheduler doesn't know about jobs, meaning there's a
> > driver-specific entity that needs to dequeue jobs from the JobQueue
> > and push those to the relevant ringbuffer. Once a FWExecutionContext
> > has something to execute, it becomes a candidate for
> > FWExecutionContextScheduler, which gets to decide which set of
> > FWExecutionContext get a chance to be scheduled by the FW.
> > That one is for Mali-CSF case I described above, and I'm not too sure
> > we want it to be generic, at least not until we have another GPU driver
> > needing the same kind of scheduling. Again, you want to defer the
> > timeout handling to this component, because the timer should only
> > start/resume when the FWExecutionContext gets scheduled, and it should
> > be paused as soon as the context gets evicted.
>
> This sounds pretty much like the existing design with the Panthor group
> scheduler layered on top of it, no?
Kinda, but with a way to use each component independently.
>
> Though, one of the fundamental problems I'd like to get rid of is that job
> ownership is transferred between two components with fundamentally different
> lifetimes (entity and scheduler).
Can you remind me what the problem is? I thought the lifetime issue was
coming from the fact the drm_sched ownership model was lax enough that
the job could be owned by both drm_gpu_scheduler and drm_sched_entity
at the same time.
>
> Instead, I think the new Jobqueue should always own and always dispatch jobs
> directly and provide some "control API" to be instructed by an external
> component (orchestrator) on top of it when and to which ring to dispatch jobs.
Feels to me that we're getting back to a model where the JobQueue needs
to know about the upper-layer in charge of the scheduling. I mean, it
can work, but you're adding some complexity back to JobQueue, which I
was expecting to be a simple FIFO with a dep-tracking logic.
For instance, I'd be curious to know which component is in charge of the
timeout in your ochestrator-based solution? In Philipp's slides it
seemed that the timeout was dealt with at the JobQueue level, but that
wouldn't work for us, because when we push a job to the ringbuf in
panthor, the group this job is queued to might not be active yet. At
the moment we have hacks to pause/resume the drm_sched timers [1] but
this is racy, so I'm really hoping that the new design will let us
control the timeout at the proper level.
>
> The group scheduling logic you need for some Mali GPUs can either be implemented
> by hooks into this orchestrator or by a separate component that attaches to the
> same control API of the Jobqueue.
I have a hard time seeing how it can fully integrate in this
orchestrator model. We can hook ourselves in the JobQueue::run_job()
and schedule the group for execution when we queue a job to the
ringbuf, but the group scheduler would still be something on the side.
This is not a big deal, as long as the group scheduler is in charge of
the timeout handling.
[1]https://lore-kernel.gnuweeb.org/dri-devel/CAPj87rP=HEfPDX8dDM_-BptLmt054x+WHZdCBZOtdMX=X4VkjA@mail.gmail.com/T/
next prev parent reply other threads:[~2025-09-30 11:57 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 15:23 [RFC v8 00/21] DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 01/21] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 02/21] drm/sched: Add some more " Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 03/21] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 04/21] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 05/21] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 06/21] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 07/21] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 08/21] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 09/21] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 10/21] drm/sched: Break submission patterns with some randomness Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 11/21] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 12/21] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 13/21] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 14/21] cgroup/drm: Track DRM clients per cgroup Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 15/21] cgroup/drm: Add scheduling weight callback Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 16/21] cgroup/drm: Introduce weight based scheduling control Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 17/21] drm/sched: Add helper for tracking entities per client Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 18/21] drm/sched: Add helper for DRM cgroup controller weight notifications Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 19/21] drm/amdgpu: Register with the DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 20/21] drm/xe: Allow changing GuC scheduling priority Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 21/21] drm/xe: Register with the DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-04 12:08 ` Tvrtko Ursulin
2025-09-29 14:07 ` [RFC v8 00/21] " Danilo Krummrich
2025-09-30 9:00 ` Philipp Stanner
2025-09-30 9:28 ` DRM Jobqueue design (was "[RFC v8 00/21] DRM scheduling cgroup controller") Danilo Krummrich
2025-09-30 10:12 ` [RFC v8 00/21] DRM scheduling cgroup controller Boris Brezillon
2025-09-30 10:58 ` Danilo Krummrich
2025-09-30 11:57 ` Boris Brezillon [this message]
2025-10-07 14:44 ` Danilo Krummrich
2025-10-07 15:44 ` Boris Brezillon
2025-10-23 11:18 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250930135736.02b69c65@fedora \
--to=boris.brezillon@collabora.com \
--cc=Leo.Liu@amd.com \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=aliceryhl@google.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=boqunf@netflix.com \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=daniel.almeida@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gpean@netflix.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=kernel-dev@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=mcanal@igalia.com \
--cc=michel.daenzer@mailbox.org \
--cc=mkoutny@suse.com \
--cc=phasta@kernel.org \
--cc=phasta@mailbox.org \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=robdclark@gmail.com \
--cc=robh@kernel.org \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tj@kernel.org \
--cc=ttabi@nvidia.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).