From: Philipp Stanner <pstanner@redhat.com>
To: "Christian König" <christian.koenig@amd.com>,
"Tvrtko Ursulin" <tursulin@igalia.com>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
Alex Deucher <alexander.deucher@amd.com>,
Luben Tuikov <ltuikov89@gmail.com>,
Matthew Brost <matthew.brost@intel.com>,
David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
stable@vger.kernel.org
Subject: Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched
Date: Mon, 09 Sep 2024 14:13:50 +0200 [thread overview]
Message-ID: <2356e3d66da3e5795295267e527042ab44f192c8.camel@redhat.com> (raw)
In-Reply-To: <80e02cde-19e7-4fb6-a572-fb45a639a3b7@amd.com>
On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote:
> Am 09.09.24 um 11:44 schrieb Philipp Stanner:
> > On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > >
> > > Without the locking amdgpu currently can race
> > > amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
> > I would explicitly say "amdgpu's amdgpu_ctx_set_entity_priority()
> > races
> > through drm_sched_entity_modify_sched() with drm_sched_job_arm()".
> >
> > The actual issue then seems to be drm_sched_job_arm() calling
> > drm_sched_entity_select_rq(). I would mention that, too.
> >
> >
> > > leading to the
> > > latter accesing potentially inconsitent entity->sched_list and
> > > entity->num_sched_list pair.
> > >
> > > The comment on drm_sched_entity_modify_sched() however says:
> > >
> > > """
> > > * Note that this must be called under the same common lock for
> > > @entity as
> > > * drm_sched_job_arm() and drm_sched_entity_push_job(), or the
> > > driver
> > > needs to
> > > * guarantee through some other means that this is never called
> > > while
> > > new jobs
> > > * can be pushed to @entity.
> > > """
> > >
> > > It is unclear if that is referring to this race or something
> > > else.
> > That comment is indeed a bit awkward. Both
> > drm_sched_entity_push_job()
> > and drm_sched_job_arm() take rq_lock. But
> > drm_sched_entity_modify_sched() doesn't.
> >
> > The comment was written in 981b04d968561. Interestingly, in
> > drm_sched_entity_push_job(), this "common lock" is mentioned with
> > the
> > soft requirement word "should" and apparently is more about keeping
> > sequence numbers in order when inserting.
> >
> > I tend to think that the issue discovered by you is unrelated to
> > that
> > comment. But if no one can make sense of the comment, should it
> > maybe
> > be removed? Confusing comment is arguably worse than no comment
>
> Agree, we probably mixed up in 981b04d968561 that submission needs a
> common lock and that rq/priority needs to be protected by the
> rq_lock.
>
> There is also the big FIXME in the drm_sched_entity documentation
> pointing out that this is most likely not implemented correctly.
>
> I suggest to move the rq, priority and rq_lock fields together in the
> drm_sched_entity structure and document that rq_lock is protecting
> the two.
That could also be a great opportunity for improving the lock naming:
void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
{
/*
* Both locks need to be grabbed, one to protect from entity->rq change
* for entity from within concurrent drm_sched_entity_select_rq and the
* other to update the rb tree structure.
*/
spin_lock(&entity->rq_lock);
spin_lock(&entity->rq->lock);
[...]
P.
>
> Then audit the code if all users of rq and priority actually hold the
> correct locks while reading and writing them.
>
> Regards,
> Christian.
>
> >
> > P.
> >
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > Fixes: b37aced31eb0 ("drm/scheduler: implement a function to
> > > modify
> > > sched list")
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: David Airlie <airlied@gmail.com>
> > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: <stable@vger.kernel.org> # v5.7+
> > > ---
> > > drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index 58c8161289fe..ae8be30472cd 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct
> > > drm_sched_entity *entity,
> > > {
> > > WARN_ON(!num_sched_list || !sched_list);
> > >
> > > + spin_lock(&entity->rq_lock);
> > > entity->sched_list = sched_list;
> > > entity->num_sched_list = num_sched_list;
> > > + spin_unlock(&entity->rq_lock);
> > > }
> > > EXPORT_SYMBOL(drm_sched_entity_modify_sched);
> > >
>
next prev parent reply other threads:[~2024-09-09 12:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240906180618.12180-1-tursulin@igalia.com>
2024-09-06 18:06 ` [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched Tvrtko Ursulin
2024-09-09 9:44 ` Philipp Stanner
2024-09-09 11:29 ` Christian König
2024-09-09 12:13 ` Philipp Stanner [this message]
2024-09-09 12:18 ` Christian König
2024-09-09 12:37 ` Tvrtko Ursulin
2024-09-09 12:46 ` Philipp Stanner
2024-09-09 13:27 ` Tvrtko Ursulin
2024-09-09 13:40 ` Philipp Stanner
2024-09-09 13:50 ` Christian König
2024-09-06 18:06 ` [RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job Tvrtko Ursulin
2024-09-09 9:51 ` Philipp Stanner
2024-09-09 11:31 ` Christian König
2024-09-06 18:06 ` [RFC 3/4] drm/sched: Always increment correct scheduler score Tvrtko Ursulin
2024-09-09 11:33 ` Christian König
2024-09-09 12:32 ` Nirmoy Das
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2356e3d66da3e5795295267e527042ab44f192c8.camel@redhat.com \
--to=pstanner@redhat.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=ltuikov89@gmail.com \
--cc=matthew.brost@intel.com \
--cc=stable@vger.kernel.org \
--cc=tursulin@igalia.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox