From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
"Christian König" <christian.koenig@amd.com>,
"Tvrtko Ursulin" <tursulin@igalia.com>,
dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, Alex Deucher <alexander.deucher@amd.com>,
Luben Tuikov <ltuikov89@gmail.com>,
Matthew Brost <matthew.brost@intel.com>,
Daniel Vetter <daniel.vetter@ffwll.ch>,
amd-gfx@lists.freedesktop.org, stable@vger.kernel.org
Subject: Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()
Date: Wed, 24 Jul 2024 13:16:11 +0200 [thread overview]
Message-ID: <fb7b7ce4-294e-463b-93b7-565099e1c2d8@gmail.com> (raw)
In-Reply-To: <ecc032cd-d595-4f4d-a96a-bee51f290547@igalia.com>
Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:
> [SNIP]
>>
>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> Absolutely.
>
> Absolutely good and absolutely me, or absolutely you? :)
You, I don't even have time to finish all the stuff I already started :/
>
> These are the TODO points and their opens:
>
> - Adjust amdgpu_ctx_set_entity_priority() to call
> drm_sched_entity_modify_sched() regardless of the hw_type - to fix
> priority changes on a single sched other than gfx or compute.
Either that or to stop using the scheduler priority to implement
userspace priorities and always use different HW queues for that.
>
> - Document sched_list array lifetime must align with the entity and
> adjust the callers.
>
> Open:
>
> Do you still oppose keeping sched_list for num_scheds == 1?
Not if you can fix up all callers.
> If so do you propose drm_sched_entity_modify_sched() keeps disagreeing
> with drm_sched_entity_init() on this detail? And keep the "one shot
> single sched_list" quirk in? Why is that nicer than simply keeping the
> list and remove that quirk? Once lifetime rules are clear it IMO is
> okay to always keep the list.
Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine
with that as well.
>
> - Remove drm_sched_entity_set_priority().
>
> Open:
>
> Should we at this point also modify amdgpu_device_init_schedulers() to
> stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?
One step at a time.
>
> Once drm_sched_entity_set_priority() is gone there is little point.
> Unless there are some non-obvious games with the kernel priority or
> something.
>
>>>>>> In general scheduler priorities were meant to be used for things
>>>>>> like kernel queues which would always have higher priority than
>>>>>> user space submissions and using them for userspace turned out to
>>>>>> be not such a good idea.
>>>>>
>>>>> Out of curiousity what were the problems? I cannot think of
>>>>> anything fundamental since there are priorities at the backend
>>>>> level after all, just fewer levels.
>>>>
>>>> A higher level queue can starve lower level queues to the point
>>>> that they never get a chance to get anything running.
>>>
>>> Oh that.. well I call that implementation details. :) Because
>>> nowhere in the uapi is I think guaranteed execution ordering needs
>>> to be strictly in descending priority. This potentially goes back to
>>> what you said above that a potential larger rewrite might be
>>> beneficial. Implementing some smarter scheduling. Although the issue
>>> will be it is just frontend scheduling after all. So a bit
>>> questionable to invest in making it too smart.
>>
>> +1 and we have a bug report complaining that RR is in at least one
>> situation better than FIFO. So it is actually quite hard to remove.
>>
>> On the other hand FIFO has some really nice properties as well. E.g.
>> try to run >100 glxgears instances (on weaker hw) and just visually
>> compare the result differences between RR and FIFO. FIFO is baby
>> smooth and RR is basically stuttering all the time.
>>
>>> I think this goes more back to what I was suggesting during early xe
>>> days, that potentially drm scheduler should be split into dependency
>>> handling part and the scheduler part. Drivers with 1:1
>>> entity:scheduler and full hardware/firmware scheduling do not really
>>> need neither fifo or rr.
>>
>> Yeah that's my thinking as well and I also suggested that multiple
>> times in discussions with Sima and others.
>>
>>>
>>>> This basically means that userspace gets a chance to submit
>>>> infinity fences with all the bad consequences.
>>>>
>>>>>
>>>>> I mean one problem unrelated to this discussion is this:
>>>>>
>>>>> void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>> {
>>>>> struct dma_fence *fence;
>>>>> struct drm_gpu_scheduler *sched;
>>>>> struct drm_sched_rq *rq;
>>>>>
>>>>> /* queue non-empty, stay on the same engine */
>>>>> if (spsc_queue_count(&entity->job_queue))
>>>>> return;
>>>>>
>>>>> Which makes it look like the entity with a constant trickle of
>>>>> jobs will never actually manage to change it's run queue. Neither
>>>>> today, nor after the potential drm_sched_entity_set_priority()
>>>>> removal.
>>>>
>>>> That's intentional and based on how the scheduler load balancing
>>>> works.
>>>
>>> I see that it is intentional but if it can silently prevent priority
>>> changes (even hw priority) it is not very solid. Unless I am missing
>>> something here.
>>
>> IIRC the GSoC student who implemented this (with me as mentor)
>> actually documented that behavior somewhere.
>>
>> And to be honest it kind of makes sense because switching priorities
>> would otherwise be disruptive, e.g. you have a moment were you need
>> to drain all previous submissions with the old priority before you
>> can do new ones with the new priority.
>
> Hmmm I don't see how it makes sense. Perhaps a test case for
> AMDGPU_SCHED_OP_*_PRIORITY_OVERRIDE is missing to show how it doesn't
> work. Or at least how easy it can be defeated with callers none the
> wiser.
Ok, that needs a bit longer explanation. You don't by any chance have teams?
>
> For context I am kind of interested because I wired up amdgpu to the
> DRM cgroup controller and use priority override to de-prioritize
> certain cgroups and it kind of works. But again, it will not be great
> if a client with a constant trickle of submissions can just defeat it.
Yeah, exactly that use case is currently not possible :(
Regards,
Christian.
>
> Regards,
>
> Tvrtko
next prev parent reply other threads:[~2024-07-24 11:16 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-19 9:47 [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority() Tvrtko Ursulin
2024-07-19 13:02 ` Christian König
2024-07-19 15:18 ` Christian König
2024-07-19 17:39 ` Matthew Brost
2024-07-22 8:12 ` AW: " Koenig, Christian
2024-07-22 13:52 ` Tvrtko Ursulin
2024-07-22 14:06 ` Christian König
2024-07-22 14:43 ` Tvrtko Ursulin
2024-07-22 15:13 ` Christian König
2024-07-24 8:16 ` Tvrtko Ursulin
2024-07-24 11:16 ` Christian König [this message]
2024-07-26 9:02 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb7b7ce4-294e-463b-93b7-565099e1c2d8@gmail.com \
--to=ckoenig.leichtzumerken@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel-dev@igalia.com \
--cc=ltuikov89@gmail.com \
--cc=matthew.brost@intel.com \
--cc=stable@vger.kernel.org \
--cc=tursulin@igalia.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox