public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: <phasta@kernel.org>
Cc: "Danilo Krummrich" <dakr@kernel.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 3/3] drm/sched: Update timedout_job()'s documentation
Date: Fri, 7 Mar 2025 09:35:39 -0800	[thread overview]
Message-ID: <Z8sua9ySbefnb/JD@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <0ff8b5ddce856a7d9f5ffbabcb220e345b2dcfaa.camel@mailbox.org>

On Fri, Mar 07, 2025 at 10:37:04AM +0100, Philipp Stanner wrote:
> On Thu, 2025-03-06 at 12:57 -0800, Matthew Brost wrote:
> > On Wed, Mar 05, 2025 at 02:05:52PM +0100, Philipp Stanner wrote:
> > > drm_sched_backend_ops.timedout_job()'s documentation is outdated.
> > > It
> > > mentions the deprecated function drm_sched_resubmit_jobs().
> > > Furthermore,
> > > it does not point out the important distinction between hardware
> > > and
> > > firmware schedulers.
> > > 
> > > Since firmware schedulers typically only use one entity per
> > > scheduler,
> > > timeout handling is significantly more simple because the entity
> > > the
> > > faulted job came from can just be killed without affecting innocent
> > > processes.
> > > 
> > > Update the documentation with that distinction and other details.
> > > 
> > > Reformat the docstring to work to a unified style with the other
> > > handles.
> > > 
> > 
> > Looks really good, one suggestion.
> 
> Already merged. But I'm working already on the TODO and could address
> your feedback in that followup.
> 
> Of course, would also be great if you could provide a proposal in a
> patch? :)
> 
> > 
> > > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > > ---
> > >  include/drm/gpu_scheduler.h | 78 ++++++++++++++++++++++-----------
> > > ----
> > >  1 file changed, 47 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/include/drm/gpu_scheduler.h
> > > b/include/drm/gpu_scheduler.h
> > > index 6381baae8024..1a7e377d4cbb 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -383,8 +383,15 @@ struct drm_sched_job {
> > >  	struct xarray			dependencies;
> > >  };
> > >  
> > > +/**
> > > + * enum drm_gpu_sched_stat - the scheduler's status
> > > + *
> > > + * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
> > > + * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
> > > + * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available
> > > anymore.
> > > + */
> > >  enum drm_gpu_sched_stat {
> > > -	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
> > > +	DRM_GPU_SCHED_STAT_NONE,
> > >  	DRM_GPU_SCHED_STAT_NOMINAL,
> > >  	DRM_GPU_SCHED_STAT_ENODEV,
> > >  };
> > > @@ -447,43 +454,52 @@ struct drm_sched_backend_ops {
> > >  	 * @timedout_job: Called when a job has taken too long to
> > > execute,
> > >  	 * to trigger GPU recovery.
> > >  	 *
> > > -	 * This method is called in a workqueue context.
> > > +	 * @sched_job: The job that has timed out
> > >  	 *
> > > -	 * Drivers typically issue a reset to recover from GPU
> > > hangs, and this
> > > -	 * procedure usually follows the following workflow:
> > > +	 * Drivers typically issue a reset to recover from GPU
> > > hangs.
> > > +	 * This procedure looks very different depending on
> > > whether a firmware
> > > +	 * or a hardware scheduler is being used.
> > >  	 *
> > > -	 * 1. Stop the scheduler using drm_sched_stop(). This will
> > > park the
> > > -	 *    scheduler thread and cancel the timeout work,
> > > guaranteeing that
> > > -	 *    nothing is queued while we reset the hardware queue
> > > -	 * 2. Try to gracefully stop non-faulty jobs (optional)
> > > -	 * 3. Issue a GPU reset (driver-specific)
> > > -	 * 4. Re-submit jobs using drm_sched_resubmit_jobs()
> > > -	 * 5. Restart the scheduler using drm_sched_start(). At
> > > that point, new
> > > -	 *    jobs can be queued, and the scheduler thread is
> > > unblocked
> > > +	 * For a FIRMWARE SCHEDULER, each ring has one scheduler,
> > > and each
> > > +	 * scheduler has one entity. Hence, the steps taken
> > > typically look as
> > > +	 * follows:
> > > +	 *
> > > +	 * 1. Stop the scheduler using drm_sched_stop(). This will
> > > pause the
> > > +	 *    scheduler workqueues and cancel the timeout work,
> > > guaranteeing
> > > +	 *    that nothing is queued while the ring is being
> > > removed.
> > > +	 * 2. Remove the ring. The firmware will make sure that
> > > the
> > > +	 *    corresponding parts of the hardware are resetted,
> > > and that other
> > > +	 *    rings are not impacted.
> > > +	 * 3. Kill the entity and the associated scheduler.
> > 
> > Xe doesn't do step 3.
> > 
> > It does:
> > - Ban entity / scheduler so futures submissions are a NOP. This would
> > be
> >   submissions with unmet dependencies. Submission at the IOCTL are
> >   disallowed 
> > - Signal all job's fences on the pending list
> > - Restart scheduler so free_job() is naturally called
> > 
> > I'm unsure if this how other firmware schedulers do this, but it
> > seems
> > to work quite well in Xe.

Missed this part of the reply.

> 
> Alright, so if I interpret this correctly you do that to avoid our
> infamous memory leaks. That makes sense.
>

Yes.
 
> The memory leaks are documented in drm_sched_fini()'s docu, but it
> could make sense to mention them here, too.
> 

The jobs in Xe ref count the scheduler so we never call drm_sched_fini
until jobs in the pending list and dependency queues has made called
free_job().

> … thinking about it, we probably actually have to rephrase this line.
> Just tearing down entity & sched makes those leaks very likely. Argh.
> 
> Nouveau, also a firmware scheduler, has effectively a copy of the
> pending_list and also ensures that all fences get signalled. Only once
> that copy of the pending list is empty it calls into drm_sched_fini().
> Take a look at nouveau_sched.c if you want, the code is quite
> straightforward.
> 

Same idea in Xe I think we just directly access the pending access list.
Let me look at what Nouveau is doing before posting an updated doc here
patch.

Matt

> P.
> 
> > 
> > Matt
> > 
> > > +	 *
> > > +	 *
> > > +	 * For a HARDWARE SCHEDULER, a scheduler instance
> > > schedules jobs from
> > > +	 * one or more entities to one ring. This implies that all
> > > entities
> > > +	 * associated with the affected scheduler cannot be torn
> > > down, because
> > > +	 * this would effectively also affect innocent userspace
> > > processes which
> > > +	 * did not submit faulty jobs (for example).
> > > +	 *
> > > +	 * Consequently, the procedure to recover with a hardware
> > > scheduler
> > > +	 * should look like this:
> > > +	 *
> > > +	 * 1. Stop all schedulers impacted by the reset using
> > > drm_sched_stop().
> > > +	 * 2. Kill the entity the faulty job stems from.
> > > +	 * 3. Issue a GPU reset on all faulty rings (driver-
> > > specific).
> > > +	 * 4. Re-submit jobs on all schedulers impacted by re-
> > > submitting them to
> > > +	 *    the entities which are still alive.
> > > +	 * 5. Restart all schedulers that were stopped in step #1
> > > using
> > > +	 *    drm_sched_start().
> > >  	 *
> > >  	 * Note that some GPUs have distinct hardware queues but
> > > need to reset
> > >  	 * the GPU globally, which requires extra synchronization
> > > between the
> > > -	 * timeout handler of the different &drm_gpu_scheduler.
> > > One way to
> > > -	 * achieve this synchronization is to create an ordered
> > > workqueue
> > > -	 * (using alloc_ordered_workqueue()) at the driver level,
> > > and pass this
> > > -	 * queue to drm_sched_init(), to guarantee that timeout
> > > handlers are
> > > -	 * executed sequentially. The above workflow needs to be
> > > slightly
> > > -	 * adjusted in that case:
> > > +	 * timeout handlers of different schedulers. One way to
> > > achieve this
> > > +	 * synchronization is to create an ordered workqueue
> > > (using
> > > +	 * alloc_ordered_workqueue()) at the driver level, and
> > > pass this queue
> > > +	 * as drm_sched_init()'s @timeout_wq parameter. This will
> > > guarantee
> > > +	 * that timeout handlers are executed sequentially.
> > >  	 *
> > > -	 * 1. Stop all schedulers impacted by the reset using
> > > drm_sched_stop()
> > > -	 * 2. Try to gracefully stop non-faulty jobs on all queues
> > > impacted by
> > > -	 *    the reset (optional)
> > > -	 * 3. Issue a GPU reset on all faulty queues (driver-
> > > specific)
> > > -	 * 4. Re-submit jobs on all schedulers impacted by the
> > > reset using
> > > -	 *    drm_sched_resubmit_jobs()
> > > -	 * 5. Restart all schedulers that were stopped in step #1
> > > using
> > > -	 *    drm_sched_start()
> > > +	 * Return: The scheduler's status, defined by &enum
> > > drm_gpu_sched_stat
> > >  	 *
> > > -	 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
> > > -	 * and the underlying driver has started or completed
> > > recovery.
> > > -	 *
> > > -	 * Return DRM_GPU_SCHED_STAT_ENODEV, if the device is no
> > > longer
> > > -	 * available, i.e. has been unplugged.
> > >  	 */
> > >  	enum drm_gpu_sched_stat (*timedout_job)(struct
> > > drm_sched_job *sched_job);
> > >  
> > > -- 
> > > 2.48.1
> > > 
> 

  parent reply	other threads:[~2025-03-07 17:35 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 13:05 [PATCH v7 0/3] drm/sched: Documentation and refcount improvements Philipp Stanner
2025-03-05 13:05 ` [PATCH v7 1/3] drm/sched: Adjust outdated docu for run_job() Philipp Stanner
2025-03-05 13:45   ` Bagas Sanjaya
2025-03-05 14:24     ` Philipp Stanner
2025-03-07 18:09   ` Maíra Canal
2025-03-07 18:17     ` Philipp Stanner
2025-03-05 13:05 ` [PATCH v7 2/3] drm/sched: Document run_job() refcount hazard Philipp Stanner
2025-03-05 13:05 ` [PATCH v7 3/3] drm/sched: Update timedout_job()'s documentation Philipp Stanner
2025-03-06 20:57   ` Matthew Brost
2025-03-07  9:37     ` Philipp Stanner
2025-03-07 17:07       ` Matthew Brost
2025-03-07 17:35       ` Matthew Brost [this message]
2025-03-06 14:28 ` [PATCH v7 0/3] drm/sched: Documentation and refcount improvements Danilo Krummrich
2025-03-06 15:44   ` Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8sua9ySbefnb/JD@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=airlied@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=phasta@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox