Re: [Intel-gfx] [RFC 00/17] DRM scheduling cgroup controller

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
	Kenny.Ho@amd.com, "Daniel Vetter" <daniel.vetter@ffwll.ch>,
	Intel-gfx@lists.freedesktop.org,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org,
	"Stéphane Marchesin" <marcheu@chromium.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Zefan Li" <lizefan.x@bytedance.com>,
	"Dave Airlie" <airlied@redhat.com>,
	cgroups@vger.kernel.org, "T . J . Mercier" <tjmercier@google.com>
Subject: Re: [Intel-gfx] [RFC 00/17] DRM scheduling cgroup controller
Date: Wed, 19 Oct 2022 08:45:34 -1000	[thread overview]
Message-ID: <Y1BFziiJdBzsIJWH@slm.duckdns.org> (raw)
In-Reply-To: <20221019173254.3361334-1-tvrtko.ursulin@linux.intel.com>

Hello,

On Wed, Oct 19, 2022 at 06:32:37PM +0100, Tvrtko Ursulin wrote:
...
> DRM static priority interface files
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>   drm.priority_levels
> 	One of:
> 	 1) And integer representing the minimum number of discrete priority
> 	    levels for the whole group.
> 	    Optionally followed by an asterisk ('*') indicating some DRM clients
> 	    in the group support more than the minimum number.
> 	 2) '0'- indicating one or more DRM clients in the group has no support
> 	    for static priority control.
> 	 3) 'n/a' - when there are no DRM clients in the configured group.
> 
>   drm.priority
> 	A read-write integer between -10000 and 10000 (inclusive) representing
> 	an abstract static priority level.
> 
>   drm.effective_priority
> 	Read only integer showing the current effective priority level for the
> 	group. Effective meaning taking into account the chain of inherited

From interface POV, this is a lot worse than the second proposal and I'd
really like to avoid this. Even if we go with mapping user priority
configuration to per-driver priorities, I'd much prefer if the interface
presented to user is weight based and let each driver try to match the
resulting hierarchical weight (ie. the absolute proportion a given cgroup
should have at the point in time) as best as they can rather than exposing
opaque priority numbers to userspace whose meaning isn't defined at all.

> DRM scheduling soft limits interface files
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>   drm.weight
> 	Standard cgroup weight based control [1, 10000] used to configure the
> 	relative distributing of GPU time between the sibling groups.

Please take a look at io.weight. This can follow the same convention to
express both global and per-device weights.

>   drm.period_us
> 	An integer representing the period with which the controller should look
> 	at the GPU usage by the group and potentially send the over/under budget
> 	signal.
> 	Value of zero (defaul) disables the soft limit checking.

Can we not do period_us or at least make it a per-driver tuning parameter
exposed as module param? Weight, users can easily understand and configure.
period_us is a lot more an implementation detail. If we want to express the
trade-off between latency and bandwidth at the interface, we prolly should
encode the latency requirement in a more canonical way but let's leave that
for the future.

>   drm.budget_supported
> 	One of:
> 	 1) 'yes' - when all DRM clients in the group support the functionality.
> 	 2) 'no' - when at least one of the DRM clients does not support the
> 		   functionality.
> 	 3) 'n/a' - when there are no DRM clients in the group.

Yeah, I'm not sure about this. This isn't a per-cgroup property to begin
with and I'm not sure 'no' meaning at least one device not supporting is
intuitive. The distinction between 'no' and 'n/a' is kinda weird too. Please
drop this.

Another basic interface question. Is everyone happy with the drm prefix or
should it be something like gpu? Also, in the future, if there's a consensus
around how to control gpu memory, what prefix would that take?

> The second proposal is a little bit more advanced in concept and also a little
> bit less finished. Interesting thing is that it builds upon the per client GPU
> utilisation work which landed recently for a few drivers. So my thinking is that
> in principle, an intersect of drivers which support both that and some sort of
> priority scheduling control, could also in theory support this.
> 
> Another really interesting angle for this controller is that it mimics the same
> control menthod used by the CPU scheduler. That is the proportional/weight based
> GPU time budgeting. Which makes it easy to configure and does not need a new
> mental model.
> 
> However, as the introduction mentions, GPUs are much more heterogenous and
> therefore the controller uses very "soft" wording as to what it promises. The
> general statement is that it can define budgets, notify clients when they are
> over them, and let individual drivers implement best effort handling of those
> conditions.
> 
> Delegation of duties in the implementation goes likes this:
> 
>  * DRM cgroup controller implements the control files and the scanning loop.
>  * DRM core is required to track all DRM clients belonging to processes so it
>    can answer when asked how much GPU time is a process using.
>  * DRM core also provides a call back which the controller will call when a
>    certain process is over budget.
>  * Individual drivers need to implement two similar hooks, but which work for
>    a single DRM client. Over budget callback and GPU utilisation query.
> 
> What I have demonstrated in practice is that when wired to i915, in a really
> primitive way where the over-budget condition simply lowers the scheduling
> priority, the concept can be almost equally effective as the static priority
> control. I say almost because the design where budget control depends on the
> periodic usage scanning has a fundamental delay, so responsiveness will depend
> on the scanning period, which may or may not be a problem for a particular use
> case.
> 
> The unfinished part is the GPU budgeting split which currently does not
> propagate unused bandwith to children, neither can share it with siblings. But
> this is not due fundamental reasons, just to avoid spending too much time on it
> too early.

Rather than doing it hierarchically on the spot, it's usually a lot cheaper
and easier to calculate the flattened hierarchical weight per leaf cgroup
and divide the bandwidth according to the eventual portions. For an example,
please take a look at block/blk-iocost.c.

I don't know much about the drm driver side, so can't comment much on it but
I do really like the idea of having the core implementation determining who
should get how much and then letting each driver enforce the target. That
seems a lot more robust and generic than trying to somehow coax and expose
per-driver priority implementations directly.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2022-10-19 18:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 17:32 [Intel-gfx] [RFC 00/17] DRM scheduling cgroup controller Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 01/17] cgroup: Add the DRM " Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 02/17] drm: Track clients per owning process Tvrtko Ursulin
2022-10-20  6:40   ` Christian König
2022-10-20  7:34     ` Tvrtko Ursulin
2022-10-20 11:33       ` Christian König
2022-10-27 14:35         ` Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 03/17] cgroup/drm: Support cgroup priority control Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 04/17] drm/cgroup: Allow safe external access to file_priv Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 05/17] drm: Connect priority updates to drm core Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 06/17] drm: Only track clients which are providing drm_cgroup_ops Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 07/17] drm/i915: i915 priority Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 08/17] drm: Allow for migration of clients Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 09/17] cgroup/drm: Introduce weight based drm cgroup control Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 10/17] drm: Add ability to query drm cgroup GPU time Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 11/17] drm: Add over budget signalling callback Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 12/17] cgroup/drm: Client exit hook Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 13/17] cgroup/drm: Ability to periodically scan cgroups for over budget GPU usage Tvrtko Ursulin
2022-10-21 22:52   ` T.J. Mercier
2022-10-27 14:45     ` Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 14/17] cgroup/drm: Show group budget signaling capability in sysfs Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 15/17] drm/i915: Migrate client to new owner on context create Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 16/17] drm/i915: Wire up with drm controller GPU time query Tvrtko Ursulin
2022-10-19 17:32 ` [Intel-gfx] [RFC 17/17] drm/i915: Implement cgroup controller over budget throttling Tvrtko Ursulin
2022-10-19 18:45 ` Tejun Heo [this message]
2022-10-27 14:32   ` [Intel-gfx] [RFC 00/17] DRM scheduling cgroup controller Tvrtko Ursulin
2022-10-31 20:20     ` Tejun Heo
2022-11-09 16:59       ` Tvrtko Ursulin
2022-10-19 19:25 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1BFziiJdBzsIJWH@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=Kenny.Ho@amd.com \
    --cc=airlied@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=marcheu@chromium.org \
    --cc=robdclark@chromium.org \
    --cc=tjmercier@google.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox