Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Philipp Stanner <pstanner@redhat.com>,
	Tvrtko Ursulin <tursulin@igalia.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Luben Tuikov <ltuikov89@gmail.com>,
	Matthew Brost <matthew.brost@intel.com>,
	David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
	stable@vger.kernel.org
Subject: Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched
Date: Mon, 9 Sep 2024 14:18:49 +0200	[thread overview]
Message-ID: <fb9556a1-b48d-49ed-9b9c-74b21fb76af4@amd.com> (raw)
In-Reply-To: <2356e3d66da3e5795295267e527042ab44f192c8.camel@redhat.com>

Am 09.09.24 um 14:13 schrieb Philipp Stanner:
> On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote:
>> Am 09.09.24 um 11:44 schrieb Philipp Stanner:
>>> On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>
>>>> Without the locking amdgpu currently can race
>>>> amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
>>> I would explicitly say "amdgpu's amdgpu_ctx_set_entity_priority()
>>> races
>>> through drm_sched_entity_modify_sched() with drm_sched_job_arm()".
>>>
>>> The actual issue then seems to be drm_sched_job_arm() calling
>>> drm_sched_entity_select_rq(). I would mention that, too.
>>>
>>>
>>>> leading to the
>>>> latter accesing potentially inconsitent entity->sched_list and
>>>> entity->num_sched_list pair.
>>>>
>>>> The comment on drm_sched_entity_modify_sched() however says:
>>>>
>>>> """
>>>>    * Note that this must be called under the same common lock for
>>>> @entity as
>>>>    * drm_sched_job_arm() and drm_sched_entity_push_job(), or the
>>>> driver
>>>> needs to
>>>>    * guarantee through some other means that this is never called
>>>> while
>>>> new jobs
>>>>    * can be pushed to @entity.
>>>> """
>>>>
>>>> It is unclear if that is referring to this race or something
>>>> else.
>>> That comment is indeed a bit awkward. Both
>>> drm_sched_entity_push_job()
>>> and drm_sched_job_arm() take rq_lock. But
>>> drm_sched_entity_modify_sched() doesn't.
>>>
>>> The comment was written in 981b04d968561. Interestingly, in
>>> drm_sched_entity_push_job(), this "common lock" is mentioned with
>>> the
>>> soft requirement word "should" and apparently is more about keeping
>>> sequence numbers in order when inserting.
>>>
>>> I tend to think that the issue discovered by you is unrelated to
>>> that
>>> comment. But if no one can make sense of the comment, should it
>>> maybe
>>> be removed? Confusing comment is arguably worse than no comment
>> Agree, we probably mixed up in 981b04d968561 that submission needs a
>> common lock and that rq/priority needs to be protected by the
>> rq_lock.
>>
>> There is also the big FIXME in the drm_sched_entity documentation
>> pointing out that this is most likely not implemented correctly.
>>
>> I suggest to move the rq, priority and rq_lock fields together in the
>> drm_sched_entity structure and document that rq_lock is protecting
>> the two.
> That could also be a great opportunity for improving the lock naming:

Well that comment made me laugh because I point out the same when the 
scheduler came out ~8years ago and nobody cared about it since then.

But yeah completely agree :)

Christian.

>
> void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
> {
> 	/*
> 	 * Both locks need to be grabbed, one to protect from entity->rq change
> 	 * for entity from within concurrent drm_sched_entity_select_rq and the
> 	 * other to update the rb tree structure.
> 	 */
> 	spin_lock(&entity->rq_lock);
> 	spin_lock(&entity->rq->lock);
>
> [...]
>
>
> P.
>
>
>> Then audit the code if all users of rq and priority actually hold the
>> correct locks while reading and writing them.
>>
>> Regards,
>> Christian.
>>
>>> P.
>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to
>>>> modify
>>>> sched list")
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Luben Tuikov <ltuikov89@gmail.com>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: David Airlie <airlied@gmail.com>
>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: <stable@vger.kernel.org> # v5.7+
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index 58c8161289fe..ae8be30472cd 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct
>>>> drm_sched_entity *entity,
>>>>    {
>>>>    	WARN_ON(!num_sched_list || !sched_list);
>>>>    
>>>> +	spin_lock(&entity->rq_lock);
>>>>    	entity->sched_list = sched_list;
>>>>    	entity->num_sched_list = num_sched_list;
>>>> +	spin_unlock(&entity->rq_lock);
>>>>    }
>>>>    EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>>>>    


  reply	other threads:[~2024-09-09 12:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240906180618.12180-1-tursulin@igalia.com>
2024-09-06 18:06 ` [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched Tvrtko Ursulin
2024-09-09  9:44   ` Philipp Stanner
2024-09-09 11:29     ` Christian König
2024-09-09 12:13       ` Philipp Stanner
2024-09-09 12:18         ` Christian König [this message]
2024-09-09 12:37           ` Tvrtko Ursulin
2024-09-09 12:46             ` Philipp Stanner
2024-09-09 13:27               ` Tvrtko Ursulin
2024-09-09 13:40                 ` Philipp Stanner
2024-09-09 13:50                 ` Christian König
2024-09-06 18:06 ` [RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job Tvrtko Ursulin
2024-09-09  9:51   ` Philipp Stanner
2024-09-09 11:31   ` Christian König
2024-09-06 18:06 ` [RFC 3/4] drm/sched: Always increment correct scheduler score Tvrtko Ursulin
2024-09-09 11:33   ` Christian König
2024-09-09 12:32   ` Nirmoy Das

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb9556a1-b48d-49ed-9b9c-74b21fb76af4@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ltuikov89@gmail.com \
    --cc=matthew.brost@intel.com \
    --cc=pstanner@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tursulin@igalia.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox