From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 36485D41177 for ; Thu, 15 Jan 2026 11:44:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CFCFE10E11C; Thu, 15 Jan 2026 11:44:29 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="oi7JGdzU"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 04B0310E11C; Thu, 15 Jan 2026 11:44:29 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id AFDB443BAF; Thu, 15 Jan 2026 11:44:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E30AC116D0; Thu, 15 Jan 2026 11:44:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768477468; bh=us7oGf8s7A31Hz7qsGcqOwfc6BdegtJ5RHsN+2wnK5U=; h=Date:From:Subject:Cc:To:References:In-Reply-To:From; b=oi7JGdzUlFzP/FMb8kEp7Q105WqQ3FGFR+OkCz/Ch/V8YuaGIEpFfY4ky1/60h3UV fDPQi9zXutKdJ05VDGVpkzEyHXaEF0zI4rGFfQQ+3eb3mF4AOIs/7eqsZINDkexlhw 60co1YZ3+XOwVoGLNwjo9h7610r0H4WtBBCifZPJ4Y/TMG3Z7FxCOMdtqeLaEWaPBS G62WW0HWvlV81CSZENm6c0hcThYR4/eF9Ow1BJRBikeyCKIenFKAZTba5iuUNCDAKp 7grboJso5ANcb/AGRkSiLKxts9tJIZiVyKMjqHKge3swMhy1dkXEYYAKsjcecStkxn mQzHUddP2dwyw== Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 15 Jan 2026 12:44:25 +0100 Message-Id: From: "Danilo Krummrich" Subject: Re: [PATCH v5 09/28] drm/sched: Add fair scheduling policy Cc: , , , =?utf-8?q?Christian_K=C3=B6nig?= , "Matthew Brost" , "Philipp Stanner" , "Pierre-Eric Pelloux-Prayer" To: "Tvrtko Ursulin" References: <20251219135351.25880-1-tvrtko.ursulin@igalia.com> <20251219135351.25880-10-tvrtko.ursulin@igalia.com> <1fceb644-ff22-45c8-bd83-4a32786c35f2@igalia.com> In-Reply-To: <1fceb644-ff22-45c8-bd83-4a32786c35f2@igalia.com> X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" On Thu Jan 15, 2026 at 9:28 AM CET, Tvrtko Ursulin wrote: > > On 14/01/2026 22:13, Danilo Krummrich wrote: >> On Fri Dec 19, 2025 at 2:53 PM CET, Tvrtko Ursulin wrote: >>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/sch= eduler/sched_rq.c >>> index 2d1f579d8352..2fde309d02a6 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_rq.c >>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c >>> @@ -16,6 +16,35 @@ drm_sched_entity_compare_before(struct rb_node *a, c= onst struct rb_node *b) >>> return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting); >>> } >>> =20 >>> +static void drm_sched_rq_update_prio(struct drm_sched_rq *rq) >>> +{ >>> + enum drm_sched_priority prio =3D DRM_SCHED_PRIORITY_INVALID; >>> + struct rb_node *rb; >>> + >>> + lockdep_assert_held(&rq->lock); >>> + >>> + rb =3D rb_first_cached(&rq->rb_tree_root); >>> + if (rb) { >>> + struct drm_sched_entity *entity =3D >>> + rb_entry(rb, typeof(*entity), rb_tree_node); >>> + >>> + /* >>> + * The normal locking order is entity then run-queue so taking >>> + * the entity lock here would be a locking inversion for the >>> + * case when the current head of the run-queue is different from >>> + * the one we already have locked. The unlocked read is fine >>> + * though, because if the priority had just changed it is no big >>> + * deal for our algorithm, but just a transient reachable only >>> + * by drivers with userspace dynamic priority changes API. Equal >>> + * in effect to the priority change becoming visible a few >>> + * instructions later. >>> + */ >>> + prio =3D READ_ONCE(entity->priority); >>=20 >> I still think that we should address the root cause of the lock inversio= n >> problem instead. >>=20 >> I previously mentioned that I can take a look at this beginning of this = year, >> which I can do soon. >>=20 >> In the meantime, can you please explain what's the problem with this spe= cific >> case? This function is only ever called from drm_sched_rq_remove_fifo_lo= cked() >> and drm_sched_rq_update_fifo_locked(), which already seem to hold both l= ocks. > > The entity which is locked is likely not the same as entity at the head= =20 > of the run-queue from either call chains. > > In one case we have just removed the locked entity from the run-queue,=20 > while in the other tree has been re-balanced so a different entity may=20 > have taken the head position. Ick! That makes it even worse because this would mean that even if we would= be able to take the entity lock here, this is also prone to lock inversion bet= ween entities. I.e. that is a huge indicator that it is even more necessary to revisit loc= king design in general. > Also to note is 99% of cases entity->priority is invariant. Only amdgpu= =20 > allows for change of priority post entity creation. So for the rest=20 > locking would not gain anything. > > Even for amdgpu the unlocked read is not very relevant, since the only=20 > thing this is used is to determine the run-queue insertion position of a= =20 > re-joining entity. So worst thing that could happen, if userspace thread= =20 > would race set priority with the job worker picking the next job, is to= =20 > *one time* pick a different job. I get that; it is less that dealing with the priority field by itself is a = huge issue we can't handle, it is more that the above workaround clearly points = out a (locking) design issue, which we should not ignore. It's not only about cod= e the code working or being correct, it's also about maintainability. (Even though I'm aware that DRM scheduler maintainability is a bit the DRM equivalent of the infamous 3x+1 problem. :) > Also to address the root cause of the lock inversion would IMHO be to=20 > re-design the whole scheduler and this specific function here does not=20 > seem should be the trigger for that. I'm not sure it is that bad, let me take a look in the next days and see wh= at options we have.