From: "Christian König" <christian.koenig@amd.com>
To: Philipp Stanner <pstanner@redhat.com>,
Tvrtko Ursulin <tursulin@igalia.com>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
Alex Deucher <alexander.deucher@amd.com>,
Luben Tuikov <ltuikov89@gmail.com>,
Matthew Brost <matthew.brost@intel.com>,
David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
stable@vger.kernel.org
Subject: Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched
Date: Mon, 9 Sep 2024 13:29:27 +0200 [thread overview]
Message-ID: <80e02cde-19e7-4fb6-a572-fb45a639a3b7@amd.com> (raw)
In-Reply-To: <8d763e5162ebc130a05da3cefbff148cdb6ce042.camel@redhat.com>
Am 09.09.24 um 11:44 schrieb Philipp Stanner:
> On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Without the locking amdgpu currently can race
>> amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
> I would explicitly say "amdgpu's amdgpu_ctx_set_entity_priority() races
> through drm_sched_entity_modify_sched() with drm_sched_job_arm()".
>
> The actual issue then seems to be drm_sched_job_arm() calling
> drm_sched_entity_select_rq(). I would mention that, too.
>
>
>> leading to the
>> latter accesing potentially inconsitent entity->sched_list and
>> entity->num_sched_list pair.
>>
>> The comment on drm_sched_entity_modify_sched() however says:
>>
>> """
>> * Note that this must be called under the same common lock for
>> @entity as
>> * drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver
>> needs to
>> * guarantee through some other means that this is never called while
>> new jobs
>> * can be pushed to @entity.
>> """
>>
>> It is unclear if that is referring to this race or something else.
> That comment is indeed a bit awkward. Both drm_sched_entity_push_job()
> and drm_sched_job_arm() take rq_lock. But
> drm_sched_entity_modify_sched() doesn't.
>
> The comment was written in 981b04d968561. Interestingly, in
> drm_sched_entity_push_job(), this "common lock" is mentioned with the
> soft requirement word "should" and apparently is more about keeping
> sequence numbers in order when inserting.
>
> I tend to think that the issue discovered by you is unrelated to that
> comment. But if no one can make sense of the comment, should it maybe
> be removed? Confusing comment is arguably worse than no comment
Agree, we probably mixed up in 981b04d968561 that submission needs a
common lock and that rq/priority needs to be protected by the rq_lock.
There is also the big FIXME in the drm_sched_entity documentation
pointing out that this is most likely not implemented correctly.
I suggest to move the rq, priority and rq_lock fields together in the
drm_sched_entity structure and document that rq_lock is protecting the two.
Then audit the code if all users of rq and priority actually hold the
correct locks while reading and writing them.
Regards,
Christian.
>
> P.
>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify
>> sched list")
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Luben Tuikov <ltuikov89@gmail.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: David Airlie <airlied@gmail.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: <stable@vger.kernel.org> # v5.7+
>> ---
>> drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 58c8161289fe..ae8be30472cd 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct
>> drm_sched_entity *entity,
>> {
>> WARN_ON(!num_sched_list || !sched_list);
>>
>> + spin_lock(&entity->rq_lock);
>> entity->sched_list = sched_list;
>> entity->num_sched_list = num_sched_list;
>> + spin_unlock(&entity->rq_lock);
>> }
>> EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>>
next prev parent reply other threads:[~2024-09-09 11:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240906180618.12180-1-tursulin@igalia.com>
2024-09-06 18:06 ` [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched Tvrtko Ursulin
2024-09-09 9:44 ` Philipp Stanner
2024-09-09 11:29 ` Christian König [this message]
2024-09-09 12:13 ` Philipp Stanner
2024-09-09 12:18 ` Christian König
2024-09-09 12:37 ` Tvrtko Ursulin
2024-09-09 12:46 ` Philipp Stanner
2024-09-09 13:27 ` Tvrtko Ursulin
2024-09-09 13:40 ` Philipp Stanner
2024-09-09 13:50 ` Christian König
2024-09-06 18:06 ` [RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job Tvrtko Ursulin
2024-09-09 9:51 ` Philipp Stanner
2024-09-09 11:31 ` Christian König
2024-09-06 18:06 ` [RFC 3/4] drm/sched: Always increment correct scheduler score Tvrtko Ursulin
2024-09-09 11:33 ` Christian König
2024-09-09 12:32 ` Nirmoy Das
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80e02cde-19e7-4fb6-a572-fb45a639a3b7@amd.com \
--to=christian.koenig@amd.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=ltuikov89@gmail.com \
--cc=matthew.brost@intel.com \
--cc=pstanner@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tursulin@igalia.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox