From: Matthew Brost <matthew.brost@intel.com>
To: <phasta@kernel.org>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 3/3] drm/sched: Update timedout_job()'s documentation
Date: Fri, 7 Mar 2025 09:35:39 -0800 [thread overview]
Message-ID: <Z8sua9ySbefnb/JD@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <0ff8b5ddce856a7d9f5ffbabcb220e345b2dcfaa.camel@mailbox.org>
On Fri, Mar 07, 2025 at 10:37:04AM +0100, Philipp Stanner wrote:
> On Thu, 2025-03-06 at 12:57 -0800, Matthew Brost wrote:
> > On Wed, Mar 05, 2025 at 02:05:52PM +0100, Philipp Stanner wrote:
> > > drm_sched_backend_ops.timedout_job()'s documentation is outdated.
> > > It
> > > mentions the deprecated function drm_sched_resubmit_jobs().
> > > Furthermore,
> > > it does not point out the important distinction between hardware
> > > and
> > > firmware schedulers.
> > >
> > > Since firmware schedulers typically only use one entity per
> > > scheduler,
> > > timeout handling is significantly more simple because the entity
> > > the
> > > faulted job came from can just be killed without affecting innocent
> > > processes.
> > >
> > > Update the documentation with that distinction and other details.
> > >
> > > Reformat the docstring to work to a unified style with the other
> > > handles.
> > >
> >
> > Looks really good, one suggestion.
>
> Already merged. But I'm working already on the TODO and could address
> your feedback in that followup.
>
> Of course, would also be great if you could provide a proposal in a
> patch? :)
>
> >
> > > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > > ---
> > > include/drm/gpu_scheduler.h | 78 ++++++++++++++++++++++-----------
> > > ----
> > > 1 file changed, 47 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/include/drm/gpu_scheduler.h
> > > b/include/drm/gpu_scheduler.h
> > > index 6381baae8024..1a7e377d4cbb 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -383,8 +383,15 @@ struct drm_sched_job {
> > > struct xarray dependencies;
> > > };
> > >
> > > +/**
> > > + * enum drm_gpu_sched_stat - the scheduler's status
> > > + *
> > > + * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
> > > + * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
> > > + * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available
> > > anymore.
> > > + */
> > > enum drm_gpu_sched_stat {
> > > - DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
> > > + DRM_GPU_SCHED_STAT_NONE,
> > > DRM_GPU_SCHED_STAT_NOMINAL,
> > > DRM_GPU_SCHED_STAT_ENODEV,
> > > };
> > > @@ -447,43 +454,52 @@ struct drm_sched_backend_ops {
> > > * @timedout_job: Called when a job has taken too long to
> > > execute,
> > > * to trigger GPU recovery.
> > > *
> > > - * This method is called in a workqueue context.
> > > + * @sched_job: The job that has timed out
> > > *
> > > - * Drivers typically issue a reset to recover from GPU
> > > hangs, and this
> > > - * procedure usually follows the following workflow:
> > > + * Drivers typically issue a reset to recover from GPU
> > > hangs.
> > > + * This procedure looks very different depending on
> > > whether a firmware
> > > + * or a hardware scheduler is being used.
> > > *
> > > - * 1. Stop the scheduler using drm_sched_stop(). This will
> > > park the
> > > - * scheduler thread and cancel the timeout work,
> > > guaranteeing that
> > > - * nothing is queued while we reset the hardware queue
> > > - * 2. Try to gracefully stop non-faulty jobs (optional)
> > > - * 3. Issue a GPU reset (driver-specific)
> > > - * 4. Re-submit jobs using drm_sched_resubmit_jobs()
> > > - * 5. Restart the scheduler using drm_sched_start(). At
> > > that point, new
> > > - * jobs can be queued, and the scheduler thread is
> > > unblocked
> > > + * For a FIRMWARE SCHEDULER, each ring has one scheduler,
> > > and each
> > > + * scheduler has one entity. Hence, the steps taken
> > > typically look as
> > > + * follows:
> > > + *
> > > + * 1. Stop the scheduler using drm_sched_stop(). This will
> > > pause the
> > > + * scheduler workqueues and cancel the timeout work,
> > > guaranteeing
> > > + * that nothing is queued while the ring is being
> > > removed.
> > > + * 2. Remove the ring. The firmware will make sure that
> > > the
> > > + * corresponding parts of the hardware are resetted,
> > > and that other
> > > + * rings are not impacted.
> > > + * 3. Kill the entity and the associated scheduler.
> >
> > Xe doesn't do step 3.
> >
> > It does:
> > - Ban entity / scheduler so futures submissions are a NOP. This would
> > be
> > submissions with unmet dependencies. Submission at the IOCTL are
> > disallowed
> > - Signal all job's fences on the pending list
> > - Restart scheduler so free_job() is naturally called
> >
> > I'm unsure if this how other firmware schedulers do this, but it
> > seems
> > to work quite well in Xe.
Missed this part of the reply.
>
> Alright, so if I interpret this correctly you do that to avoid our
> infamous memory leaks. That makes sense.
>
Yes.
> The memory leaks are documented in drm_sched_fini()'s docu, but it
> could make sense to mention them here, too.
>
The jobs in Xe ref count the scheduler so we never call drm_sched_fini
until jobs in the pending list and dependency queues has made called
free_job().
> … thinking about it, we probably actually have to rephrase this line.
> Just tearing down entity & sched makes those leaks very likely. Argh.
>
> Nouveau, also a firmware scheduler, has effectively a copy of the
> pending_list and also ensures that all fences get signalled. Only once
> that copy of the pending list is empty it calls into drm_sched_fini().
> Take a look at nouveau_sched.c if you want, the code is quite
> straightforward.
>
Same idea in Xe I think we just directly access the pending access list.
Let me look at what Nouveau is doing before posting an updated doc here
patch.
Matt
> P.
>
> >
> > Matt
> >
> > > + *
> > > + *
> > > + * For a HARDWARE SCHEDULER, a scheduler instance
> > > schedules jobs from
> > > + * one or more entities to one ring. This implies that all
> > > entities
> > > + * associated with the affected scheduler cannot be torn
> > > down, because
> > > + * this would effectively also affect innocent userspace
> > > processes which
> > > + * did not submit faulty jobs (for example).
> > > + *
> > > + * Consequently, the procedure to recover with a hardware
> > > scheduler
> > > + * should look like this:
> > > + *
> > > + * 1. Stop all schedulers impacted by the reset using
> > > drm_sched_stop().
> > > + * 2. Kill the entity the faulty job stems from.
> > > + * 3. Issue a GPU reset on all faulty rings (driver-
> > > specific).
> > > + * 4. Re-submit jobs on all schedulers impacted by re-
> > > submitting them to
> > > + * the entities which are still alive.
> > > + * 5. Restart all schedulers that were stopped in step #1
> > > using
> > > + * drm_sched_start().
> > > *
> > > * Note that some GPUs have distinct hardware queues but
> > > need to reset
> > > * the GPU globally, which requires extra synchronization
> > > between the
> > > - * timeout handler of the different &drm_gpu_scheduler.
> > > One way to
> > > - * achieve this synchronization is to create an ordered
> > > workqueue
> > > - * (using alloc_ordered_workqueue()) at the driver level,
> > > and pass this
> > > - * queue to drm_sched_init(), to guarantee that timeout
> > > handlers are
> > > - * executed sequentially. The above workflow needs to be
> > > slightly
> > > - * adjusted in that case:
> > > + * timeout handlers of different schedulers. One way to
> > > achieve this
> > > + * synchronization is to create an ordered workqueue
> > > (using
> > > + * alloc_ordered_workqueue()) at the driver level, and
> > > pass this queue
> > > + * as drm_sched_init()'s @timeout_wq parameter. This will
> > > guarantee
> > > + * that timeout handlers are executed sequentially.
> > > *
> > > - * 1. Stop all schedulers impacted by the reset using
> > > drm_sched_stop()
> > > - * 2. Try to gracefully stop non-faulty jobs on all queues
> > > impacted by
> > > - * the reset (optional)
> > > - * 3. Issue a GPU reset on all faulty queues (driver-
> > > specific)
> > > - * 4. Re-submit jobs on all schedulers impacted by the
> > > reset using
> > > - * drm_sched_resubmit_jobs()
> > > - * 5. Restart all schedulers that were stopped in step #1
> > > using
> > > - * drm_sched_start()
> > > + * Return: The scheduler's status, defined by &enum
> > > drm_gpu_sched_stat
> > > *
> > > - * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
> > > - * and the underlying driver has started or completed
> > > recovery.
> > > - *
> > > - * Return DRM_GPU_SCHED_STAT_ENODEV, if the device is no
> > > longer
> > > - * available, i.e. has been unplugged.
> > > */
> > > enum drm_gpu_sched_stat (*timedout_job)(struct
> > > drm_sched_job *sched_job);
> > >
> > > --
> > > 2.48.1
> > >
>
next prev parent reply other threads:[~2025-03-07 17:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-05 13:05 [PATCH v7 0/3] drm/sched: Documentation and refcount improvements Philipp Stanner
2025-03-05 13:05 ` [PATCH v7 1/3] drm/sched: Adjust outdated docu for run_job() Philipp Stanner
2025-03-05 13:45 ` Bagas Sanjaya
2025-03-05 14:24 ` Philipp Stanner
2025-03-07 18:09 ` Maíra Canal
2025-03-07 18:17 ` Philipp Stanner
2025-03-05 13:05 ` [PATCH v7 2/3] drm/sched: Document run_job() refcount hazard Philipp Stanner
2025-03-05 13:05 ` [PATCH v7 3/3] drm/sched: Update timedout_job()'s documentation Philipp Stanner
2025-03-06 20:57 ` Matthew Brost
2025-03-07 9:37 ` Philipp Stanner
2025-03-07 17:07 ` Matthew Brost
2025-03-07 17:35 ` Matthew Brost [this message]
2025-03-06 14:28 ` [PATCH v7 0/3] drm/sched: Documentation and refcount improvements Danilo Krummrich
2025-03-06 15:44 ` Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8sua9ySbefnb/JD@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=airlied@gmail.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=phasta@kernel.org \
--cc=simona@ffwll.ch \
--cc=sumit.semwal@linaro.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox