Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: phasta@kernel.org, dri-devel@lists.freedesktop.org,
	kernel-dev@igalia.com, "Dan Carpenter" <dan.carpenter@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Rob Clark" <robdclark@chromium.org>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies
Date: Mon, 13 Oct 2025 10:05:56 -0700	[thread overview]
Message-ID: <aO0xdLWYdPOKE9r2@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <fa1cf2f6-a82d-46f4-b1ec-b07e678cad76@igalia.com>

On Mon, Oct 13, 2025 at 06:01:49PM +0100, Tvrtko Ursulin wrote:
> 
> On 13/10/2025 15:39, Philipp Stanner wrote:
> > On Fri, 2025-10-03 at 10:26 +0100, Tvrtko Ursulin wrote:
> > > Drm_sched_job_add_dependency() consumes the fence reference both on
> > 
> > s/D/d
> 
> I cannot find the source right now but I am 99% sure someone pulled it on me
> in the past and educated me sentences should always start with a capital
> letter, even when it is not a proper word. Because in the past I was always
> doing the same as you suggest. I don't mind TBH.

Drm_sched_job_add_dependency breaks grep / searches, so I'm with Philipp s/D/d

Matt

> > > success and failure, so in the latter case the dma_fence_put() on the
> > > error path (xarray failed to expand) is a double free.
> > > 
> > > Interestingly this bug appears to have been present ever since
> > > ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code back
> > > then looked like this:
> > > 
> > > drm_sched_job_add_implicit_dependencies():
> > > ...
> > >         for (i = 0; i < fence_count; i++) {
> > >                 ret = drm_sched_job_add_dependency(job, fences[i]);
> > >                 if (ret)
> > >                         break;
> > >         }
> > > 
> > >         for (; i < fence_count; i++)
> > >                 dma_fence_put(fences[i]);
> > > 
> > > Which means for the failing 'i' the dma_fence_put was already a double
> > > free. Possibly there were no users at that time, or the test cases were
> > > insufficient to hit it.
> > > 
> > > The bug was then only noticed and fixed after
> > > 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2")
> > > landed, with its fixup of
> > > 4eaf02d6076c ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").
> > > 
> > > At that point it was a slightly different flavour of a double free, which
> > > 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
> > > noticed and attempted to fix.
> > > 
> > > But it only moved the double free from happening inside the
> > > drm_sched_job_add_dependency(), when releasing the reference not yet
> > > obtained, to the caller, when releasing the reference already released by
> > > the former in the failure case.
> > 
> > That's certainly interesting, but is there a specific reason why you
> > include all of that?
> 
> Yes, because bug was attempted to be fixed two times already. So it is IMO
> worth having a write up on the history.
> > The code is as is, and AFAICS it's just a bug stemming from original
> > bugs present and then refactorings happening.
> > 
> > I would at least remove the old 'implicit_dependencies' function from
> > the commit message. It's just confusing and makes one look for that in
> > the current code or patch.It does say "back then" and it references the
> > commit so shouldn't be
> confusing.
> 
> The discussion is long for an additional reason that Fixes: targets not the
> original commit which did not get this right, but the last change which
> tried to fix it but did not. It sounded more logical to continue the chain
> on fixes but dunno. Could replace it with the original one just as well.
> > > As such it is not easy to identify the right target for the fixes tag so
> > > lets keep it simple and just continue the chain.
> > > 
> > > We also drop the misleading comment about additional reference, since it
> > > is not additional but the only one from the point of view of dependency
> > > tracking.
> > 
> > 
> > IMO that comment is nonsense. It's useless, too, because I can *see*
> > that a reference is being taken there, but not *why*.
> > 
> > Argh, these comments. See also my commit 72ebc18b34993
> > 
> > 
> > Anyways. Removing it is fine, but adding a better comment is better.
> > See below.
> > 
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > Fixes: 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
> > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> > 
> > Is there an error report that could be included here with a Closes:
> > tag?No, it did not come from any such system. Could perhaps put a link
> > to
> the report as Reference:
> https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/
> 
> ?
> 
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Rob Clark <robdclark@chromium.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > Cc: Philipp Stanner <phasta@kernel.org>
> > > Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: <stable@vger.kernel.org> # v5.16+
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_main.c | 14 +++++---------
> > >   1 file changed, 5 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 46119aacb809..aff34240f230 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -960,20 +960,16 @@ int drm_sched_job_add_resv_dependencies(struct drm_sched_job *job,
> > >   {
> > >   	struct dma_resv_iter cursor;
> > >   	struct dma_fence *fence;
> > > -	int ret;
> > > +	int ret = 0;
> > >   	dma_resv_assert_held(resv);
> > >   	dma_resv_for_each_fence(&cursor, resv, usage, fence) {
> > > -		/* Make sure to grab an additional ref on the added fence */
> > > -		dma_fence_get(fence);
> > > -		ret = drm_sched_job_add_dependency(job, fence);
> > > -		if (ret) {
> > > -			dma_fence_put(fence);
> > > -			return ret;
> > > -		}
> > > +		ret = drm_sched_job_add_dependency(job, dma_fence_get(fence));
> > 
> > You still take a reference as before, but there is no comment anymore.
> > Can you add one explaining why a new reference is taken here?
> > 
> > I guess it will be something like "This needs a new reference for the
> > job", since you cannot rely on the one from resv.
> 
> I was mulling it over but then thought the kerneldoc for
> drm_sched_job_add_dependency() is good enough since it explains the function
> always consumes the reference and dma_resv_for_each_fence() does not hold
> one over the iteration. I can add it.
> 
> > 
> > > +		if (ret)
> > > +			break;
> > >   	}
> > > -	return 0;
> > > +	return ret;
> > 
> > 
> > That's an unnecessarily enlargement of the git diff because of style,
> > isn't it? Better keep the diff minimal here for git blame.
> 
> Given the relative size of the diff versus the function itself it did not
> look like a big deal to reduce to one return statement while touching it,
> but I can keep two returns just as well, no problem.
> 
> Regards,
> 
> Tvrtko
> 

      reply	other threads:[~2025-10-13 17:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03  9:26 [PATCH] drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies Tvrtko Ursulin
2025-10-13 12:21 ` Tvrtko Ursulin
2025-10-13 14:39 ` Philipp Stanner
2025-10-13 17:01   ` Tvrtko Ursulin
2025-10-13 17:05     ` Matthew Brost [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aO0xdLWYdPOKE9r2@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dan.carpenter@linaro.org \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=phasta@kernel.org \
    --cc=robdclark@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox