From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
To: Danilo Krummrich <dakr@kernel.org>
Cc: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
kernel-dev@igalia.com, intel-xe@lists.freedesktop.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
"Christian König" <christian.koenig@amd.com>,
"Leo Liu" <Leo.Liu@amd.com>, "Maíra Canal" <mcanal@igalia.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Michal Koutný" <mkoutny@suse.com>,
"Michel Dänzer" <michel.daenzer@mailbox.org>,
"Philipp Stanner" <phasta@kernel.org>,
"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
"Rob Clark" <robdclark@gmail.com>, "Tejun Heo" <tj@kernel.org>,
"Alexandre Courbot" <acourbot@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"John Hubbard" <jhubbard@nvidia.com>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Timur Tabi" <ttabi@nvidia.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Lucas De Marchi" <lucas.demarchi@intel.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Boris Brezillon" <boris.brezillon@collabora.com>,
"Rob Herring" <robh@kernel.org>,
"Steven Price" <steven.price@arm.com>,
"Liviu Dudau" <liviu.dudau@arm.com>,
"Daniel Almeida" <daniel.almeida@collabora.com>,
"Alice Ryhl" <aliceryhl@google.com>,
"Boqun Feng" <boqunf@netflix.com>,
"Grégoire Péan" <gpean@netflix.com>
Subject: Re: [RFC v8 00/21] DRM scheduling cgroup controller
Date: Thu, 23 Oct 2025 12:18:16 +0100 [thread overview]
Message-ID: <a3c7b8f7-0f2c-4e0c-a55d-3e4433f795db@igalia.com> (raw)
In-Reply-To: <DD5CCG4MIODH.1718JI1Z7GH8T@kernel.org>
On 29/09/2025 15:07, Danilo Krummrich wrote:
> On Wed Sep 3, 2025 at 5:23 PM CEST, Tvrtko Ursulin wrote:
>> This is another respin of this old work^1 which since v7 is a total rewrite and
>> completely changes how the control is done.
>
> I only got some of the patches of the series, can you please send all of them
> for subsequent submissions? You may also want to consider resending if you're
> not getting a lot of feedback due to that. :)
There is so many cc across the series that I am reluctant to copy
everyone on all patches. So I count on people being subscribed to
mailing lists and being able to look into the archives if all else fails.
Regarding the luke warm response here is short video showing it in action:
https://people.igalia.com/tursulin/drm_cgroup.mp4
Please ignore the typos made in the video commentary but I would say it
is worth a watch.
Lets see if that helps to paint a picture to people on what this can do.
With some minimum imagination different use cases are obvious as well.
For example start a compute job in the background with the UI still
being responsive.
>> On the userspace interface side of things it is the same as before. We have
>> drm.weight as an interface, taking integers from 1 to 10000, the same as CPU and
>> IO cgroup controllers.
>
> In general, I think it would be good to get GPU vendors to speak up to what kind
> of interfaces they're heading to with firmware schedulers and potential firmware
> APIs to control scheduling; especially given that this will be a uAPI.
>
> (Adding a couple of folks to Cc.)
>
> Having that said, I think the basic drm.weight interface is fine and should work
> in any case; i.e. with the existing DRM GPU scheduler in both modes, the
> upcoming DRM Jobqueue efforts and should be generic enough to work with
> potential firmware interfaces we may see in the future.Yes, basic drm.weight should not be controversial at all.
For all drivers which use the DRM scheduler in the 1:N mode it is
trivial to wire the support up once the "fair" DRM scheduler lands.
Trivial because scheduling weight is directly compatible with virtual
GPU time accounting fair scheduler implements. This series has an
example how to do it for amdgpu and many other simple drivers could do
it exactly like with a few lines of boilerplate code.
For some 1:1 firmware scheduling drivers, like xe for example, patch
series also includes a sketch on how it could make use drm.weight by
giving firmware a hint what is the most important, and what is least
important. In practice that is also usable for some use cases. (In fact
the demo video above was made with xe! Results with amdgpu are pretty
similar but I hit some snags with screen recording on that device.)
Possibly the main problem causing the luke warm response, as far as I
understood during the XDC last month, is what to do about the drivers
where seemingly neither approach can be implemented.
Like nouveau for example. Thinking seems to be it couldn't be wired up.
I don't know that driver nor hardware (firmware) so I cannot say.
To satisfy that concern one idea I had is that perhaps I could expose a
new control file like drm.weight_supported or something, which could
have a semantics along the lines of:
+ - all active client/drivers in the cgroup support the feature
? - a mix of supported and unsupported
- - none support the drm.weight feature
That would give visibility to the "Why is this thing not doing anything
on my system?" question. Then over time solutions on how to support even
those problematic drivers with closed firmware could be found. There
will certainly be motivation not to be the one with worse user experience.
Regards,
Tvrtko
prev parent reply other threads:[~2025-10-23 11:18 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 15:23 [RFC v8 00/21] DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 01/21] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 02/21] drm/sched: Add some more " Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 03/21] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 04/21] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 05/21] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 06/21] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 07/21] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 08/21] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 09/21] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 10/21] drm/sched: Break submission patterns with some randomness Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 11/21] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 12/21] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 13/21] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 14/21] cgroup/drm: Track DRM clients per cgroup Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 15/21] cgroup/drm: Add scheduling weight callback Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 16/21] cgroup/drm: Introduce weight based scheduling control Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 17/21] drm/sched: Add helper for tracking entities per client Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 18/21] drm/sched: Add helper for DRM cgroup controller weight notifications Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 19/21] drm/amdgpu: Register with the DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 20/21] drm/xe: Allow changing GuC scheduling priority Tvrtko Ursulin
2025-09-03 15:23 ` [RFC 21/21] drm/xe: Register with the DRM scheduling cgroup controller Tvrtko Ursulin
2025-09-04 12:08 ` Tvrtko Ursulin
2025-09-29 14:07 ` [RFC v8 00/21] " Danilo Krummrich
2025-09-30 9:00 ` Philipp Stanner
2025-09-30 9:28 ` DRM Jobqueue design (was "[RFC v8 00/21] DRM scheduling cgroup controller") Danilo Krummrich
2025-09-30 10:12 ` [RFC v8 00/21] DRM scheduling cgroup controller Boris Brezillon
2025-09-30 10:58 ` Danilo Krummrich
2025-09-30 11:57 ` Boris Brezillon
2025-10-07 14:44 ` Danilo Krummrich
2025-10-07 15:44 ` Boris Brezillon
2025-10-23 11:18 ` Tvrtko Ursulin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3c7b8f7-0f2c-4e0c-a55d-3e4433f795db@igalia.com \
--to=tvrtko.ursulin@igalia.com \
--cc=Leo.Liu@amd.com \
--cc=acourbot@nvidia.com \
--cc=alexander.deucher@amd.com \
--cc=aliceryhl@google.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=boqunf@netflix.com \
--cc=boris.brezillon@collabora.com \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=daniel.almeida@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gpean@netflix.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jhubbard@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=kernel-dev@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=mcanal@igalia.com \
--cc=michel.daenzer@mailbox.org \
--cc=mkoutny@suse.com \
--cc=phasta@kernel.org \
--cc=pierre-eric.pelloux-prayer@amd.com \
--cc=robdclark@gmail.com \
--cc=robh@kernel.org \
--cc=rodrigo.vivi@intel.com \
--cc=steven.price@arm.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tj@kernel.org \
--cc=ttabi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox