From: Danilo Krummrich <dakr@kernel.org>
To: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: phasta@kernel.org, "Lyude Paul" <lyude@redhat.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Matthew Brost" <matthew.brost@intel.com>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] drm/sched: Warn if pending list is not empty
Date: Thu, 17 Apr 2025 14:11:14 +0200 [thread overview]
Message-ID: <aADv4ivXZoJpEA7k@pollux> (raw)
In-Reply-To: <3ac34c84-fd84-4598-96e1-239418b7109f@igalia.com>
On Thu, Apr 17, 2025 at 12:27:29PM +0100, Tvrtko Ursulin wrote:
>
> On 17/04/2025 08:45, Philipp Stanner wrote:
> > On Mon, 2025-04-07 at 17:22 +0200, Philipp Stanner wrote:
>
> Problem exactly is that jobs can outlive the entities and the scheduler,
> while some userspace may have a dma fence reference to the job via sync
> file. This new callback would not solve it for xe, but if everything
> required was reference counted it would.
I think you're mixing up the job and the dma_fence here, if a job outlives the
scheduler, it clearly is a bug, always has been.
AFAIK, Xe reference counts it's driver specific job structures *and* the driver
specific scheduler structure, such that drm_sched_fini() won't be called before
all jobs have finished.
The reference counting solution, however, potentially creates subsequent
lifetime problems, which I think have been discussed already in a different
thread already. Philipp can probably link in the corresponding discussion.
> On the design level it feels like it adds too much state machine and makes
> things hard to follow/maintain. It would be nice to find a solutiuon where
> we end up with less state machine and not more.
Multiple solutions have been discussed already, e.g. just wait for the pending
list to be empty, reference count the scheduler for every pending job. Those all
had significant downsides, which I don't see with this proposal.
I'm all for better ideas though -- what do you propose?
>
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 6b72278c4b72..ae3152beca14 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1465,6 +1465,10 @@ void drm_sched_fini(struct drm_gpu_scheduler
> > > *sched)
> > > sched->ready = false;
> > > kfree(sched->sched_rq);
> > > sched->sched_rq = NULL;
> > > +
> > > + if (!list_empty(&sched->pending_list))
> > > + dev_err(sched->dev, "%s: Tearing down scheduler
> > > while jobs are pending!\n",
> > > + __func__);
>
> It isn't fair to add this error since it would out of the blue start firing
> for everyone expect nouveau, no? Regardless if there is a leak or not.
I think it is pretty fair to warn when detecting a guaranteed bug, no?
If drm_sched_fini() is call while jobs are still on the pending_list, they won't
ever be freed, because all workqueues are stopped.
next prev parent reply other threads:[~2025-04-17 12:11 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 15:22 [PATCH 0/5] drm/sched: Fix memory leaks in drm_sched_fini() Philipp Stanner
2025-04-07 15:22 ` [PATCH 1/5] drm/sched: Fix teardown leaks with waitqueue Philipp Stanner
2025-04-17 7:49 ` Philipp Stanner
2025-04-07 15:22 ` [PATCH 2/5] drm/sched: Prevent teardown waitque from blocking too long Philipp Stanner
2025-04-07 15:22 ` [PATCH 3/5] drm/sched: Warn if pending list is not empty Philipp Stanner
2025-04-17 7:45 ` Philipp Stanner
2025-04-17 11:27 ` Tvrtko Ursulin
2025-04-17 12:11 ` Danilo Krummrich [this message]
2025-04-17 14:20 ` Tvrtko Ursulin
2025-04-17 14:48 ` Danilo Krummrich
2025-04-17 16:08 ` Tvrtko Ursulin
2025-04-17 17:07 ` Danilo Krummrich
2025-04-22 6:06 ` Philipp Stanner
2025-04-22 10:39 ` Tvrtko Ursulin
2025-04-22 11:13 ` Danilo Krummrich
2025-04-22 12:00 ` Philipp Stanner
2025-04-22 13:25 ` Tvrtko Ursulin
2025-04-22 12:07 ` Tvrtko Ursulin
2025-04-22 12:21 ` Philipp Stanner
2025-04-22 12:32 ` Danilo Krummrich
2025-04-22 13:39 ` Tvrtko Ursulin
2025-04-22 13:46 ` Philipp Stanner
2025-04-22 14:08 ` Danilo Krummrich
2025-04-22 14:16 ` Philipp Stanner
2025-04-22 14:52 ` Danilo Krummrich
2025-04-23 7:34 ` Tvrtko Ursulin
2025-04-23 8:48 ` Danilo Krummrich
2025-04-23 10:10 ` Tvrtko Ursulin
2025-04-23 10:26 ` Danilo Krummrich
2025-04-07 15:22 ` [PATCH 4/5] drm/nouveau: Add new callback for scheduler teardown Philipp Stanner
2025-04-07 15:22 ` [PATCH 5/5] drm/nouveau: Remove waitque for sched teardown Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aADv4ivXZoJpEA7k@pollux \
--to=dakr@kernel.org \
--cc=airlied@gmail.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=phasta@kernel.org \
--cc=simona@ffwll.ch \
--cc=tvrtko.ursulin@igalia.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox