Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Philipp Stanner <pstanner@redhat.com>
To: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Tvrtko Ursulin" <tursulin@igalia.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>,
	Luben Tuikov <ltuikov89@gmail.com>,
	 Matthew Brost <matthew.brost@intel.com>,
	David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
	 stable@vger.kernel.org
Subject: Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched
Date: Mon, 09 Sep 2024 14:46:57 +0200	[thread overview]
Message-ID: <d8944e6ad57e4efcd480d917a38f9cee9475d59c.camel@redhat.com> (raw)
In-Reply-To: <14ef37f4-b982-41c1-8121-80882917e9c0@igalia.com>

On Mon, 2024-09-09 at 13:37 +0100, Tvrtko Ursulin wrote:
> 
> On 09/09/2024 13:18, Christian König wrote:
> > Am 09.09.24 um 14:13 schrieb Philipp Stanner:
> > > On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote:
> > > > Am 09.09.24 um 11:44 schrieb Philipp Stanner:
> > > > > On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> > > > > > From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > > > > 
> > > > > > Without the locking amdgpu currently can race
> > > > > > amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
> > > > > I would explicitly say "amdgpu's
> > > > > amdgpu_ctx_set_entity_priority()
> > > > > races
> > > > > through drm_sched_entity_modify_sched() with
> > > > > drm_sched_job_arm()".
> > > > > 
> > > > > The actual issue then seems to be drm_sched_job_arm() calling
> > > > > drm_sched_entity_select_rq(). I would mention that, too.
> > > > > 
> > > > > 
> > > > > > leading to the
> > > > > > latter accesing potentially inconsitent entity->sched_list
> > > > > > and
> > > > > > entity->num_sched_list pair.
> > > > > > 
> > > > > > The comment on drm_sched_entity_modify_sched() however
> > > > > > says:
> > > > > > 
> > > > > > """
> > > > > >    * Note that this must be called under the same common
> > > > > > lock for
> > > > > > @entity as
> > > > > >    * drm_sched_job_arm() and drm_sched_entity_push_job(),
> > > > > > or the
> > > > > > driver
> > > > > > needs to
> > > > > >    * guarantee through some other means that this is never
> > > > > > called
> > > > > > while
> > > > > > new jobs
> > > > > >    * can be pushed to @entity.
> > > > > > """
> > > > > > 
> > > > > > It is unclear if that is referring to this race or
> > > > > > something
> > > > > > else.
> > > > > That comment is indeed a bit awkward. Both
> > > > > drm_sched_entity_push_job()
> > > > > and drm_sched_job_arm() take rq_lock. But
> > > > > drm_sched_entity_modify_sched() doesn't.
> > > > > 
> > > > > The comment was written in 981b04d968561. Interestingly, in
> > > > > drm_sched_entity_push_job(), this "common lock" is mentioned
> > > > > with
> > > > > the
> > > > > soft requirement word "should" and apparently is more about
> > > > > keeping
> > > > > sequence numbers in order when inserting.
> > > > > 
> > > > > I tend to think that the issue discovered by you is unrelated
> > > > > to
> > > > > that
> > > > > comment. But if no one can make sense of the comment, should
> > > > > it
> > > > > maybe
> > > > > be removed? Confusing comment is arguably worse than no
> > > > > comment
> > > > Agree, we probably mixed up in 981b04d968561 that submission
> > > > needs a
> > > > common lock and that rq/priority needs to be protected by the
> > > > rq_lock.
> > > > 
> > > > There is also the big FIXME in the drm_sched_entity
> > > > documentation
> > > > pointing out that this is most likely not implemented
> > > > correctly.
> > > > 
> > > > I suggest to move the rq, priority and rq_lock fields together
> > > > in the
> > > > drm_sched_entity structure and document that rq_lock is
> > > > protecting
> > > > the two.
> > > That could also be a great opportunity for improving the lock
> > > naming:
> > 
> > Well that comment made me laugh because I point out the same when
> > the 
> > scheduler came out ~8years ago and nobody cared about it since
> > then.
> > 
> > But yeah completely agree :)
> 
> Maybe, but we need to keep in sight the fact some of these fixes may
> be 
> good to backport. In which case re-naming exercises are best left to
> follow.

My argument basically. It's good if fixes and other improvements are
separated, in general, unless there is a practical / good reason not
to.

> 
> Also..
> 
> > > void drm_sched_rq_update_fifo(struct drm_sched_entity *entity,
> > > ktime_t 
> > > ts)
> > > {
> > >     /*
> > >      * Both locks need to be grabbed, one to protect from entity-
> > > >rq 
> > > change
> > >      * for entity from within concurrent
> > > drm_sched_entity_select_rq 
> > > and the
> > >      * other to update the rb tree structure.
> > >      */
> > >     spin_lock(&entity->rq_lock);
> > >     spin_lock(&entity->rq->lock);
> 
> .. I agree this is quite unredable and my initial reaction was a
> similar 
> ugh. However.. What names would you guys suggest and for what to make
> this better and not lessen the logic of naming each individually?

According to the documentation, drm_sched_rq.lock does not protect the
entire runque, but "@lock: to modify the entities list."

So I would keep drm_sched_entity.rq_lock as it is, because it indeed
protects the entire runqueue.

And drm_sched_rq.lock could be named "entities_lock" or
"entities_list_lock" or something. That's debatable, but it should be
something that highlights that this lock is not for locking the entire
runque as the one in the entity apparently is.


Cheers,
P.

> 
> Regards,
> 
> Tvrtko
> 
> > > [...]
> > > 
> > > 
> > > P.
> > > 
> > > 
> > > > Then audit the code if all users of rq and priority actually
> > > > hold the
> > > > correct locks while reading and writing them.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > P.
> > > > > 
> > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > > > > Fixes: b37aced31eb0 ("drm/scheduler: implement a function
> > > > > > to
> > > > > > modify
> > > > > > sched list")
> > > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > > Cc: David Airlie <airlied@gmail.com>
> > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > > Cc: <stable@vger.kernel.org> # v5.7+
> > > > > > ---
> > > > > >    drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
> > > > > >    1 file changed, 2 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > index 58c8161289fe..ae8be30472cd 100644
> > > > > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > @@ -133,8 +133,10 @@ void
> > > > > > drm_sched_entity_modify_sched(struct
> > > > > > drm_sched_entity *entity,
> > > > > >    {
> > > > > >        WARN_ON(!num_sched_list || !sched_list);
> > > > > > +    spin_lock(&entity->rq_lock);
> > > > > >        entity->sched_list = sched_list;
> > > > > >        entity->num_sched_list = num_sched_list;
> > > > > > +    spin_unlock(&entity->rq_lock);
> > > > > >    }
> > > > > >    EXPORT_SYMBOL(drm_sched_entity_modify_sched);
> > 
> 


  reply	other threads:[~2024-09-09 12:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240906180618.12180-1-tursulin@igalia.com>
2024-09-06 18:06 ` [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched Tvrtko Ursulin
2024-09-09  9:44   ` Philipp Stanner
2024-09-09 11:29     ` Christian König
2024-09-09 12:13       ` Philipp Stanner
2024-09-09 12:18         ` Christian König
2024-09-09 12:37           ` Tvrtko Ursulin
2024-09-09 12:46             ` Philipp Stanner [this message]
2024-09-09 13:27               ` Tvrtko Ursulin
2024-09-09 13:40                 ` Philipp Stanner
2024-09-09 13:50                 ` Christian König
2024-09-06 18:06 ` [RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job Tvrtko Ursulin
2024-09-09  9:51   ` Philipp Stanner
2024-09-09 11:31   ` Christian König
2024-09-06 18:06 ` [RFC 3/4] drm/sched: Always increment correct scheduler score Tvrtko Ursulin
2024-09-09 11:33   ` Christian König
2024-09-09 12:32   ` Nirmoy Das

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8944e6ad57e4efcd480d917a38f9cee9475d59c.camel@redhat.com \
    --to=pstanner@redhat.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ltuikov89@gmail.com \
    --cc=matthew.brost@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tursulin@igalia.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox