AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Danilo Krummrich" <dakr@kernel.org>
To: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>
Cc: phasta@kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, kernel-dev@igalia.com,
	"Christian König" <christian.koenig@amd.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>
Subject: Re: [PATCH v2 09/27] drm/sched: Add fair scheduling policy
Date: Wed, 22 Oct 2025 16:44:22 +0200	[thread overview]
Message-ID: <DDOXJCVU8HVW.S58H517LVMEV@kernel.org> (raw)
In-Reply-To: <c836e71d-9cde-4379-9905-0fd881a252dd@igalia.com>

On Wed Oct 22, 2025 at 4:24 PM CEST, Tvrtko Ursulin wrote:
>
> On 22/10/2025 15:03, Danilo Krummrich wrote:
>> On Wed Oct 22, 2025 at 3:50 PM CEST, Tvrtko Ursulin wrote:
>>> Yes, for the case when entity joins the run-queue it can be the same
>>> entity which is now the head of the queue, or it can be a different one.
>>> Depends on the insertion position.
>>>
>>> But for the case where entity is leaving the run queue it is always a
>>> different entity and therefore a lock inversion. We have essentially this:
>>>
>>> lock entity
>>> lock rq
>>> remove entity from the rq
>>> rq->prio = rq->head_entity->prio // different entity, unlocked read
>>> unlock rq
>>> unlock entity
>> 
>> This sounds like it repeates the unclear locking situation that is also
>> documented for struct drm_sched_rq:
>> 
>> 	 * FIXME: Locking is very unclear for this. Writers are protected by
>> 	 * @lock, but readers are generally lockless and seem to just race with
>> 	 * not even a READ_ONCE.
>> 
>> This sounds pretty suspicious to me and I think it indicates a more fundamental
>> design issue that you now end up working around now.
>
> I'm afraid it is not nearly the same. Guarantee that entity->rq is 
> stable is a multi-step one which depends on the job queue being non 
> empty and the last submitted job not being signalled. That side even 
> includes a smp_rmb() in drm_sched_entity_select_rq(). Code which does 
> the suspicious unlocked entity->rq access therefore claims to be certain 
> one or both of those conditions must be true.
>
> What I am doing here is way, way simpler and IMO should not 
> controversial. It is well defined that entities can only enter and exit 
> the run queue with the rq->lock held. Which the code path holds, and the 
> functions asserts for. So a lockless read of an integer is nowhere near 
> the complexities of the FIXME you quote.

What I'm saying is that the pattern is the same, the writer side protects an
entity field with a lock, whereas the reader side does not.

And the fact that this is done for locking inversion reasons indicates a design
issue -- not necessarily with your code, but maybe with the existing code.

>> I'd like to dig in a bit more, but unfortunately it's very unlikely I will have
>> the time to do this until after LPC.
>
> Should I interpret this as putting a blocker on the series until 
> effectively 2026?

It means that I won't have time to help by digging in and see if it's in fact an
issue you inherit from the existing code.

  reply	other threads:[~2025-10-22 14:44 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 13:36 [PATCH v2 00/27] Fair DRM scheduler Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 01/27] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 02/27] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 03/27] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 04/27] drm/sched: Add some more " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 05/27] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 06/27] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 07/27] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 08/27] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 09/27] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-10-20 13:57   ` Philipp Stanner
2025-10-20 14:39     ` Tvrtko Ursulin
2025-10-22  6:56       ` Philipp Stanner
2025-10-22 13:50         ` Tvrtko Ursulin
2025-10-22 14:03           ` Danilo Krummrich
2025-10-22 14:24             ` Tvrtko Ursulin
2025-10-22 14:44               ` Danilo Krummrich [this message]
2025-10-17 13:36 ` [PATCH v2 10/27] drm/sched: Favour interactive clients slightly Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 11/27] drm/sched: Switch default policy to fair Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 12/27] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 13/27] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 14/27] accel/amdxdna: Remove drm_sched_init_args->num_rqs usage Tvrtko Ursulin
2025-10-17 15:55   ` Lizhi Hou
2025-10-17 13:36 ` [PATCH v2 15/27] accel/rocket: " Tvrtko Ursulin
2025-10-18 14:32   ` Tomeu Vizoso
2025-10-17 13:36 ` [PATCH v2 16/27] drm/amdgpu: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 17/27] drm/etnaviv: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 18/27] drm/imagination: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 19/27] drm/lima: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 20/27] drm/msm: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 21/27] drm/nouveau: " Tvrtko Ursulin
2025-10-22 19:51   ` Lyude Paul
2025-10-17 13:36 ` [PATCH v2 22/27] drm/panfrost: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 23/27] drm/panthor: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 24/27] drm/sched: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 25/27] drm/v3d: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 26/27] drm/xe: " Tvrtko Ursulin
2025-10-17 13:36 ` [PATCH v2 27/27] drm/sched: Remove drm_sched_init_args->num_rqs Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DDOXJCVU8HVW.S58H517LVMEV@kernel.org \
    --to=dakr@kernel.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=matthew.brost@intel.com \
    --cc=phasta@kernel.org \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox