public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Danilo Krummrich <dakr@kernel.org>
To: phasta@kernel.org
Cc: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] drm/sched: Warn if pending list is not empty
Date: Tue, 22 Apr 2025 16:52:54 +0200	[thread overview]
Message-ID: <aAetRm3Sbp9vzamg@cassiopeiae> (raw)
In-Reply-To: <f0ae2d411c21e799491244fe49880a4acca32918.camel@mailbox.org>

On Tue, Apr 22, 2025 at 04:16:48PM +0200, Philipp Stanner wrote:
> On Tue, 2025-04-22 at 16:08 +0200, Danilo Krummrich wrote:
> > On Tue, Apr 22, 2025 at 02:39:21PM +0100, Tvrtko Ursulin wrote:
> 
> > > Sorry I don't see the argument for the claim it is relying on the
> > > internals
> > > with the re-positioned drm_sched_fini call. In that case it is
> > > fully
> > > compliant with:
> > > 
> > > /**
> > >  * drm_sched_fini - Destroy a gpu scheduler
> > >  *
> > >  * @sched: scheduler instance
> > >  *
> > >  * Tears down and cleans up the scheduler.
> > >  *
> > >  * This stops submission of new jobs to the hardware through
> > >  * drm_sched_backend_ops.run_job(). Consequently,
> > > drm_sched_backend_ops.free_job()
> > >  * will not be called for all jobs still in
> > > drm_gpu_scheduler.pending_list.
> > >  * There is no solution for this currently. Thus, it is up to the
> > > driver to
> > > make
> > >  * sure that:
> > >  *
> > >  *  a) drm_sched_fini() is only called after for all submitted jobs
> > >  *     drm_sched_backend_ops.free_job() has been called or that
> > >  *  b) the jobs for which drm_sched_backend_ops.free_job() has not
> > > been
> > > called
> > >  *
> > >  * FIXME: Take care of the above problem and prevent this function
> > > from
> > > leaking
> > >  * the jobs in drm_gpu_scheduler.pending_list under any
> > > circumstances.
> > > 
> > > ^^^ recommended solution b).
> > 
> > This has been introduced recently with commit baf4afc58314
> > ("drm/sched: Improve
> > teardown documentation") and I do not agree with this. The scheduler
> > should
> > *not* make any promises about implementation details to enable
> > drivers to abuse
> > their knowledge about component internals.
> > 
> > This makes the problem *worse* as it encourages drivers to rely on
> > implementation details, making maintainability of the scheduler even
> > worse.
> > 
> > For instance, what if I change the scheduler implementation, such
> > that for every
> > entry in the pending_list the scheduler allocates another internal
> > object for
> > ${something}? Then drivers would already fall apart leaking those
> > internal
> > objects.
> > 
> > Now, obviously that's pretty unlikely, but I assume you get the idea.
> > 
> > The b) paragraph in drm_sched_fini() should be removed for the given
> > reasons.
> > 
> > AFAICS, since the introduction of this commit, driver implementations
> > haven't
> > changed in this regard, hence we should be good.
> > 
> > So, for me this doesn't change the fact that every driver
> > implementation that
> > just stops the scheduler at an arbitrary point of time and tries to
> > clean things
> > up manually relying on knowledge about component internals is broken.
> 
> To elaborate on that, this documentation has been written so that we at
> least have *some* documentation about the problem, instead of just
> letting new drivers run into the knife.
> 
> The commit explicitly introduced the FIXME, marking those two hacky
> workarounds as undesirable.
> 
> But back then we couldn't fix the problem quickly, so it was either
> document the issue at least a bit, or leave it completely undocumented.

Agreed, but b) really sounds like an invitation (or even justification) for
doing the wrong thing, let's removed it.

  reply	other threads:[~2025-04-22 14:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 15:22 [PATCH 0/5] drm/sched: Fix memory leaks in drm_sched_fini() Philipp Stanner
2025-04-07 15:22 ` [PATCH 1/5] drm/sched: Fix teardown leaks with waitqueue Philipp Stanner
2025-04-17  7:49   ` Philipp Stanner
2025-04-07 15:22 ` [PATCH 2/5] drm/sched: Prevent teardown waitque from blocking too long Philipp Stanner
2025-04-07 15:22 ` [PATCH 3/5] drm/sched: Warn if pending list is not empty Philipp Stanner
2025-04-17  7:45   ` Philipp Stanner
2025-04-17 11:27     ` Tvrtko Ursulin
2025-04-17 12:11       ` Danilo Krummrich
2025-04-17 14:20         ` Tvrtko Ursulin
2025-04-17 14:48           ` Danilo Krummrich
2025-04-17 16:08             ` Tvrtko Ursulin
2025-04-17 17:07               ` Danilo Krummrich
2025-04-22  6:06               ` Philipp Stanner
2025-04-22 10:39                 ` Tvrtko Ursulin
2025-04-22 11:13                   ` Danilo Krummrich
2025-04-22 12:00                     ` Philipp Stanner
2025-04-22 13:25                       ` Tvrtko Ursulin
2025-04-22 12:07                     ` Tvrtko Ursulin
2025-04-22 12:21                       ` Philipp Stanner
2025-04-22 12:32                       ` Danilo Krummrich
2025-04-22 13:39                         ` Tvrtko Ursulin
2025-04-22 13:46                           ` Philipp Stanner
2025-04-22 14:08                           ` Danilo Krummrich
2025-04-22 14:16                             ` Philipp Stanner
2025-04-22 14:52                               ` Danilo Krummrich [this message]
2025-04-23  7:34                                 ` Tvrtko Ursulin
2025-04-23  8:48                                   ` Danilo Krummrich
2025-04-23 10:10                                     ` Tvrtko Ursulin
2025-04-23 10:26                                       ` Danilo Krummrich
2025-04-07 15:22 ` [PATCH 4/5] drm/nouveau: Add new callback for scheduler teardown Philipp Stanner
2025-04-07 15:22 ` [PATCH 5/5] drm/nouveau: Remove waitque for sched teardown Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aAetRm3Sbp9vzamg@cassiopeiae \
    --to=dakr@kernel.org \
    --cc=airlied@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=phasta@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tvrtko.ursulin@igalia.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox