public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Luben Tuikov <luben.tuikov@amd.com>
To: "Danilo Krummrich" <dakr@redhat.com>,
	"Christian König" <christian.koenig@amd.com>,
	airlied@gmail.com, daniel@ffwll.ch, l.stach@pengutronix.de,
	"Prosyak, Vitaly" <Vitaly.Prosyak@amd.com>
Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()
Date: Tue, 11 Apr 2023 16:06:30 -0400	[thread overview]
Message-ID: <21c789de-e381-cccd-9093-741dcd7a2dc5@amd.com> (raw)
In-Reply-To: <32c28b47-2a3b-db1d-e927-ae44d52cae0b@redhat.com>

On 2023-04-11 14:13, Danilo Krummrich wrote:
> On 4/5/23 19:39, Luben Tuikov wrote:
>> On 2023-03-31 01:59, Christian König wrote:
>>> Am 31.03.23 um 02:06 schrieb Danilo Krummrich:
>>>> It already happend a few times that patches slipped through which
>>>> implemented access to an entity through a job that was already removed
>>>> from the entities queue. Since jobs and entities might have different
>>>> lifecycles, this can potentially cause UAF bugs.
>>>>
>>>> In order to make it obvious that a jobs entity pointer shouldn't be
>>>> accessed after drm_sched_entity_pop_job() was called successfully, set
>>>> the jobs entity pointer to NULL once the job is removed from the entity
>>>> queue.
>>>>
>>>> Moreover, debugging a potential NULL pointer dereference is way easier
>>>> than potentially corrupted memory through a UAF.
>>>>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>
>>> In general "YES PLEASE!", but I fear that this will break amdgpus reset
>>> sequence.
>>>
>>> On the other hand when amdgpu still relies on that pointer it's clearly
>>> a bug (which I pointed out tons of times before).
>>>
>>> Luben any opinion on that? Could you drive cleaning that up as well?
>>
>> I didn't find any references to scheduling entity after the job
>> is submitted to the hardware. (I commented the same in the other
>> thread, we just need to decide which way to go.)
> 
> AFAICS from the other mail thread it seems to be consensus to not 
> ref-count entities and handle job statistics differently.
> 
> Should we go ahead and take this patch then? Maybe it also makes sense 
> to send a V2 additionally adding a comment to the drm_sched_job 
> structure mentioning that .entity must not be used after the job was 
> taken from the entities queue.

Yes, please send a v2, but instead of mentioning (or in addition to)
that the job was taken from the "entities queue", mention that
once the job is pushed to the hardware, i.e. the pending queue,
from then on, the "entity" should not be referenced anymore. IOW, we
want to mention that it is going from X to Y, as opposed to just
that it's taken from X, because the latter begs the question "Where
is it going to then?".

For the record, I think that using kref would give us the most
stability, even if we skipped "entity" and let the scheduler, or
even the controller keep a kref on submitted commands down to
the GPU. On reset, we block command submission to the GPU from
the outermost boundary, and then start peeling the layers from
the innermost boundary.

Using kref also forces us to think objectively and set explicit
(clear) dependencies from the outset--i.e. who gets freed and when
and under what circumstances.

And using kref might even expose the wrong dependencies, thus
prompting a redesign and thus a fix.
-- 
Regards,
Luben


      reply	other threads:[~2023-04-11 20:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-31  0:06 [PATCH] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job() Danilo Krummrich
2023-03-31  5:59 ` Christian König
2023-03-31 12:57   ` Luben Tuikov
2023-04-05 17:39   ` Luben Tuikov
2023-04-11 18:13     ` Danilo Krummrich
2023-04-11 20:06       ` Luben Tuikov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21c789de-e381-cccd-9093-741dcd7a2dc5@amd.com \
    --to=luben.tuikov@amd.com \
    --cc=Vitaly.Prosyak@amd.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=l.stach@pengutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox