Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

Linux kernel -stable discussions
 help / color / mirror / Atom feed

From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Tvrtko Ursulin" <tursulin@igalia.com>,
	dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, Alex Deucher <alexander.deucher@amd.com>,
	Luben Tuikov <ltuikov89@gmail.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	amd-gfx@lists.freedesktop.org, stable@vger.kernel.org
Subject: Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
Date: Wed, 24 Jul 2024 13:16:11 +0200	[thread overview]
Message-ID: <fb7b7ce4-294e-463b-93b7-565099e1c2d8@gmail.com> (raw)
In-Reply-To: <ecc032cd-d595-4f4d-a96a-bee51f290547@igalia.com>

Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:
> [SNIP]
>>
>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> Absolutely.
>
> Absolutely good and absolutely me, or absolutely you? :)

You, I don't even have time to finish all the stuff I already started :/

>
> These are the TODO points and their opens:
>
> - Adjust amdgpu_ctx_set_entity_priority() to call 
> drm_sched_entity_modify_sched() regardless of the hw_type - to fix 
> priority changes on a single sched other than gfx or compute.

Either that or to stop using the scheduler priority to implement 
userspace priorities and always use different HW queues for that.

>
> - Document sched_list array lifetime must align with the entity and 
> adjust the callers.
>
> Open:
>
> Do you still oppose keeping sched_list for num_scheds == 1?

Not if you can fix up all callers.

> If so do you propose drm_sched_entity_modify_sched() keeps disagreeing 
> with drm_sched_entity_init() on this detail? And keep the "one shot 
> single sched_list" quirk in? Why is that nicer than simply keeping the 
> list and remove that quirk? Once lifetime rules are clear it IMO is 
> okay to always keep the list.

Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine 
with that as well.

>
> - Remove drm_sched_entity_set_priority().
>
> Open:
>
> Should we at this point also modify amdgpu_device_init_schedulers() to 
> stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?

One step at a time.

>
> Once drm_sched_entity_set_priority() is gone there is little point. 
> Unless there are some non-obvious games with the kernel priority or 
> something.
>
>>>>>> In general scheduler priorities were meant to be used for things 
>>>>>> like kernel queues which would always have higher priority than 
>>>>>> user space submissions and using them for userspace turned out to 
>>>>>> be not such a good idea.
>>>>>
>>>>> Out of curiousity what were the problems? I cannot think of 
>>>>> anything fundamental since there are priorities at the backend 
>>>>> level after all, just fewer levels.
>>>>
>>>> A higher level queue can starve lower level queues to the point 
>>>> that they never get a chance to get anything running.
>>>
>>> Oh that.. well I call that implementation details. :) Because 
>>> nowhere in the uapi is I think guaranteed execution ordering needs 
>>> to be strictly in descending priority. This potentially goes back to 
>>> what you said above that a potential larger rewrite might be 
>>> beneficial. Implementing some smarter scheduling. Although the issue 
>>> will be it is just frontend scheduling after all. So a bit 
>>> questionable to invest in making it too smart.
>>
>> +1 and we have a bug report complaining that RR is in at least one 
>> situation better than FIFO. So it is actually quite hard to remove.
>>
>> On the other hand FIFO has some really nice properties as well. E.g. 
>> try to run >100 glxgears instances (on weaker hw) and just visually 
>> compare the result differences between RR and FIFO. FIFO is baby 
>> smooth and RR is basically stuttering all the time.
>>
>>> I think this goes more back to what I was suggesting during early xe 
>>> days, that potentially drm scheduler should be split into dependency 
>>> handling part and the scheduler part. Drivers with 1:1 
>>> entity:scheduler and full hardware/firmware scheduling do not really 
>>> need neither fifo or rr.
>>
>> Yeah that's my thinking as well and I also suggested that multiple 
>> times in discussions with Sima and others.
>>
>>>
>>>> This basically means that userspace gets a chance to submit 
>>>> infinity fences with all the bad consequences.
>>>>
>>>>>
>>>>> I mean one problem unrelated to this discussion is this:
>>>>>
>>>>> void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>> {
>>>>>     struct dma_fence *fence;
>>>>>     struct drm_gpu_scheduler *sched;
>>>>>     struct drm_sched_rq *rq;
>>>>>
>>>>>     /* queue non-empty, stay on the same engine */
>>>>>     if (spsc_queue_count(&entity->job_queue))
>>>>>         return;
>>>>>
>>>>> Which makes it look like the entity with a constant trickle of 
>>>>> jobs will never actually manage to change it's run queue. Neither 
>>>>> today, nor after the potential drm_sched_entity_set_priority() 
>>>>> removal.
>>>>
>>>> That's intentional and based on how the scheduler load balancing 
>>>> works.
>>>
>>> I see that it is intentional but if it can silently prevent priority 
>>> changes (even hw priority) it is not very solid. Unless I am missing 
>>> something here.
>>
>> IIRC the GSoC student who implemented this (with me as mentor) 
>> actually documented that behavior somewhere.
>>
>> And to be honest it kind of makes sense because switching priorities 
>> would otherwise be disruptive, e.g. you have a moment were you need 
>> to drain all previous submissions with the old priority before you 
>> can do new ones with the new priority.
>
> Hmmm I don't see how it makes sense. Perhaps a test case for 
> AMDGPU_SCHED_OP_*_PRIORITY_OVERRIDE is missing to show how it doesn't 
> work. Or at least how easy it can be defeated with callers none the 
> wiser.

Ok, that needs a bit longer explanation. You don't by any chance have teams?

>
> For context I am kind of interested because I wired up amdgpu to the 
> DRM cgroup controller and use priority override to de-prioritize 
> certain cgroups and it kind of works. But again, it will not be great 
> if a client with a constant trickle of submissions can just defeat it.

Yeah, exactly that use case is currently not possible :(

Regards,
Christian.


>
> Regards,
>
> Tvrtko

next prev parent reply	other threads:[~2024-07-24 11:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-19  9:47 [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority() Tvrtko Ursulin
2024-07-19 13:02 ` Christian König
2024-07-19 15:18   ` Christian König
2024-07-19 17:39     ` Matthew Brost
2024-07-22  8:12       ` AW: " Koenig, Christian
2024-07-22 13:52     ` Tvrtko Ursulin
2024-07-22 14:06       ` Christian König
2024-07-22 14:43         ` Tvrtko Ursulin
2024-07-22 15:13           ` Christian König
2024-07-24  8:16             ` Tvrtko Ursulin
2024-07-24 11:16               ` Christian König [this message]
2024-07-26  9:02                 ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb7b7ce4-294e-463b-93b7-565099e1c2d8@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=ltuikov89@gmail.com \
    --cc=matthew.brost@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tursulin@igalia.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox