Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>,
	Philipp Stanner <pstanner@redhat.com>,
	<intel-xe@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>,
	Alex Deucher <alexander.deucher@amd.com>, <dakr@kernel.org>
Subject: Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
Date: Fri, 5 Dec 2025 10:54:21 -0800	[thread overview]
Message-ID: <aTMqXdD/H3S8TYX8@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <3f71a73a-8360-409c-8a51-55c37230d4cf@amd.com>

On Fri, Dec 05, 2025 at 10:19:32AM +0100, Christian König wrote:
> On 12/4/25 17:04, Alex Deucher wrote:
> > On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <pstanner@redhat.com> wrote:
> >>
> >> +Cc Alex, Christian, Danilo
> >>
> >>
> >> On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> >>> Stop open coding pending job list in drivers. Add pending job list
> >>> iterator which safely walks DRM scheduler list asserting DRM scheduler
> >>> is stopped.
> >>>
> >>> v2:
> >>>  - Fix checkpatch (CI)
> >>> v3:
> >>>  - Drop locked version (Christian)
> >>> v4:
> >>>  - Reorder patch (Niranjana)
> >>
> >> Same with the changelog.
> >>
> >>>
> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> >>> ---
> >>>  include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 50 insertions(+)
> >>>
> >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>> index 385bf34e76fe..9d228513d06c 100644
> >>> --- a/include/drm/gpu_scheduler.h
> >>> +++ b/include/drm/gpu_scheduler.h
> >>> @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> >>>               dma_fence_is_signaled(&s_fence->finished);
> >>>  }
> >>>
> >>> +/**
> >>> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> >>> + * @sched: DRM scheduler associated with pending job iterator
> >>> + */
> >>> +struct drm_sched_pending_job_iter {
> >>> +     struct drm_gpu_scheduler *sched;
> >>> +};
> >>> +
> >>> +/* Drivers should never call this directly */
> >>> +static inline struct drm_sched_pending_job_iter
> >>> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> >>> +{
> >>> +     struct drm_sched_pending_job_iter iter = {
> >>> +             .sched = sched,
> >>> +     };
> >>> +
> >>> +     WARN_ON(!drm_sched_is_stopped(sched));
> >>> +     return iter;
> >>> +}
> >>> +
> >>> +/* Drivers should never call this directly */
> >>> +static inline void
> >>> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> >>> +{
> >>> +     WARN_ON(!drm_sched_is_stopped(iter.sched));
> >>> +}
> >>> +
> >>> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> >>> +          __drm_sched_pending_job_iter_end(_T),
> >>> +          __drm_sched_pending_job_iter_begin(__sched),
> >>> +          struct drm_gpu_scheduler *__sched);
> >>> +static inline void *
> >>> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> >>> +{ return _T; }
> >>> +#define class_drm_sched_pending_job_iter_is_conditional false
> >>> +
> >>> +/**
> >>> + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> >>> + * @__job: Current pending job being iterated over
> >>> + * @__sched: DRM scheduler to iterate over pending jobs
> >>> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> >>> + *
> >>> + * Iterator for each pending job in scheduler, filtering on an entity, and
> >>> + * enforcing scheduler is fully stopped
> >>> + */
> >>> +#define drm_sched_for_each_pending_job(__job, __sched, __entity)             \
> >>> +     scoped_guard(drm_sched_pending_job_iter, (__sched))                     \
> >>> +             list_for_each_entry((__job), &(__sched)->pending_list, list)    \
> >>> +                     for_each_if(!(__entity) || (__job)->entity == (__entity))
> >>> +
> >>>  #endif
> >>
> >>
> >> See my comments in the first patch. The docu doesn't mention at all why
> >> this new functionality exists and when and why users would be expected
> >> to use it.
> >>
> >> As far as I remember from XDC, both AMD and Intel overwrite a timed out
> >> jobs buffer data in the rings on GPU reset. To do so, the driver needs
> >> the timedout job (passed through timedout_job() callback) and then
> >> needs all the pending non-broken jobs.
> >>
> >> AFAICS your patch provides a generic iterator over the entire
> >> pending_list. How is a driver then supposed to determine which are the
> >> non-broken jobs (just asking, but that needs to be documented)?
> >>
> >> Could it make sense to use a different iterator which only returns jobs
> >> of not belonging to the same context as the timedout-one?
> >>
> >> Those are important questions that need to be addressed before merging
> >> that.
> >>
> >> And if this works canonically (i.e., for basically everyone), it needs
> >> to be documented in drm_sched_resubmit_jobs() that this iterator is now
> >> the canonical way of handling timeouts.
> >>
> >> Moreover, btw, just yesterday I added an entry to the DRM todo list
> >> which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
> >> would have to be removed, too.
> >>
> >>
> >> @AMD: Would the code Matthew provides work for you? Please give your
> >> input. This is very important common infrastructure.
> > 
> > I don't think drm_sched_resubmit_jobs() can work for us without major
> > rework.  For our kernel queues, we have a single queue on which jobs
> > for different clients are scheduled.  When we reset the queue, we lose
> > all jobs on the queue and have to re-emit the non-guilty ones.  We do
> > this at the ring level, i.e., we save the packets directly from the
> > ring and then re-emit the packets for the non-guilty contexts to the
> > freshly reset ring.  This avoids running run_job() again which would
> > issue new fences and race with memory management, etc.
> > 
> > I think the following would be workable:
> > 1. driver job_timedout() callback flags the job as bad. resets the bad
> > queue, and calls drm_sched_resubmit_jobs()
> > 2. drm_sched_resubmit_jobs() walks the pending list and calls
> > run_job() for every job
> 
> Calling run_job() multiple times was one of the worst ideas I have ever seen.
> 

I'm not sure who this is directed at but maybe dial back the intensity a
bit here. I really doubt this is one of the worst ideas you've ever
seen.

> The problem here is that you need a transactional approach to the internal driver state which is modified by ->run_job().
> 
> So for example if you have:
> ->run_job(A)
> ->run_job(B)
> ->run_job(C)
> 
> And after a reset you find that you need to re-submit only job B and A & C are filtered then that means that your driver state needs to get back before running job A.
> 

This scenario isn’t possible in Xe due to the 1:1 relationship between
the scheduler and the entity. Jobs execute serially on a single ring. So
if B needs to be resubmitted, so does C. I’m not sure how AMDGPU works,
but it seems like a significant problem if this scenario can occur or
the scheduler is being misused, as AFAIK jobs submitted to a scheduler
instance should signal in order.

> > 2. driver run_job() callback looks to see if we already ran this job
> > and uses the original fence rather than allocating a new one
> 
> Nope, the problem is *all* drivers *must* use the original fence. Otherwise you always run into trouble.
> 

Isn’t that what Alex is saying—always use the original fence? I fully
agree with this; Xe does the same. That’s why drm_sched_resubmit_jobs is
confusing, as the way it’s coded assumes run_job can return a different
fence than the original invocation of run_job. This is part of the
reason I didn’t use this function in Xe—it looked scary.

> We should not promote a driver interface which makes it extremely easy to shoot down the whole system.
> 
> > 3. driver run_job() callback checks to see if the job is guilty or
> > from the same context and if so, sets an error on the fences and
> > submits only the fence packet to the queue so that any follow up jobs
> > will properly synchronize if they need to wait on the fence from the
> > bad job.
> > 4. driver run_job() callback will submit the full packet stream for
> > non-guilty contexts
> > 
> > I guess we could use the iterator and implement that logic in the
> > driver directly rather than using drm_sched_resubmit_jobs().
> 
> Yeah, exactly that's the way to go.
> 

I think we’re in agreement that this patch can be rebased, address any
comments here, and then be merged?

Matt

> Christian.
> 
> > 
> > Alex
> 

  reply	other threads:[~2025-12-05 18:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
2025-12-03  8:56   ` Philipp Stanner
2025-12-03 21:10     ` Matthew Brost
2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
2025-12-03  9:07   ` Philipp Stanner
2025-12-03 10:28     ` Philipp Stanner
2025-12-04 16:04     ` Alex Deucher
2025-12-05  9:19       ` Christian König
2025-12-05 18:54         ` Matthew Brost [this message]
2025-12-08 13:33         ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
2025-12-03  9:38   ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
2025-12-03 10:56   ` Philipp Stanner
2025-12-03 20:44     ` Matthew Brost
2025-12-08 13:44       ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
2025-12-02  6:42   ` Umesh Nerlige Ramappa
2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
2025-12-02  7:31   ` Umesh Nerlige Ramappa
2025-12-02 15:14     ` Matthew Brost
2025-12-02  0:53 ` ✗ CI.checkpatch: warning for Fix DRM scheduler layering violations in Xe (rev8) Patchwork
2025-12-02  0:55 ` ✓ CI.KUnit: success " Patchwork
2025-12-02  2:05 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-02  5:18 ` ✓ Xe.CI.Full: " Patchwork
2025-12-03  1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-03  8:33   ` Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTMqXdD/H3S8TYX8@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=pstanner@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox