From: Philipp Stanner <pstanner@redhat.com>
To: Tvrtko Ursulin <tursulin@igalia.com>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
"Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Luben Tuikov" <ltuikov89@gmail.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"David Airlie" <airlied@gmail.com>,
"Daniel Vetter" <daniel@ffwll.ch>,
stable@vger.kernel.org
Subject: Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched
Date: Mon, 09 Sep 2024 11:44:36 +0200 [thread overview]
Message-ID: <8d763e5162ebc130a05da3cefbff148cdb6ce042.camel@redhat.com> (raw)
In-Reply-To: <20240906180618.12180-2-tursulin@igalia.com>
On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> Without the locking amdgpu currently can race
> amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
I would explicitly say "amdgpu's amdgpu_ctx_set_entity_priority() races
through drm_sched_entity_modify_sched() with drm_sched_job_arm()".
The actual issue then seems to be drm_sched_job_arm() calling
drm_sched_entity_select_rq(). I would mention that, too.
> leading to the
> latter accesing potentially inconsitent entity->sched_list and
> entity->num_sched_list pair.
>
> The comment on drm_sched_entity_modify_sched() however says:
>
> """
> * Note that this must be called under the same common lock for
> @entity as
> * drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver
> needs to
> * guarantee through some other means that this is never called while
> new jobs
> * can be pushed to @entity.
> """
>
> It is unclear if that is referring to this race or something else.
That comment is indeed a bit awkward. Both drm_sched_entity_push_job()
and drm_sched_job_arm() take rq_lock. But
drm_sched_entity_modify_sched() doesn't.
The comment was written in 981b04d968561. Interestingly, in
drm_sched_entity_push_job(), this "common lock" is mentioned with the
soft requirement word "should" and apparently is more about keeping
sequence numbers in order when inserting.
I tend to think that the issue discovered by you is unrelated to that
comment. But if no one can make sense of the comment, should it maybe
be removed? Confusing comment is arguably worse than no comment
P.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify
> sched list")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Luben Tuikov <ltuikov89@gmail.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.7+
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 58c8161289fe..ae8be30472cd 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct
> drm_sched_entity *entity,
> {
> WARN_ON(!num_sched_list || !sched_list);
>
> + spin_lock(&entity->rq_lock);
> entity->sched_list = sched_list;
> entity->num_sched_list = num_sched_list;
> + spin_unlock(&entity->rq_lock);
> }
> EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>
next prev parent reply other threads:[~2024-09-09 9:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240906180618.12180-1-tursulin@igalia.com>
2024-09-06 18:06 ` [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched Tvrtko Ursulin
2024-09-09 9:44 ` Philipp Stanner [this message]
2024-09-09 11:29 ` Christian König
2024-09-09 12:13 ` Philipp Stanner
2024-09-09 12:18 ` Christian König
2024-09-09 12:37 ` Tvrtko Ursulin
2024-09-09 12:46 ` Philipp Stanner
2024-09-09 13:27 ` Tvrtko Ursulin
2024-09-09 13:40 ` Philipp Stanner
2024-09-09 13:50 ` Christian König
2024-09-06 18:06 ` [RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job Tvrtko Ursulin
2024-09-09 9:51 ` Philipp Stanner
2024-09-09 11:31 ` Christian König
2024-09-06 18:06 ` [RFC 3/4] drm/sched: Always increment correct scheduler score Tvrtko Ursulin
2024-09-09 11:33 ` Christian König
2024-09-09 12:32 ` Nirmoy Das
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8d763e5162ebc130a05da3cefbff148cdb6ce042.camel@redhat.com \
--to=pstanner@redhat.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=ltuikov89@gmail.com \
--cc=matthew.brost@intel.com \
--cc=stable@vger.kernel.org \
--cc=tursulin@igalia.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox