From: Philipp Stanner <pstanner@redhat.com>
To: "Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexdeucher@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Alex Deucher <alexander.deucher@amd.com>,
dakr@kernel.org
Subject: Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
Date: Mon, 08 Dec 2025 14:33:37 +0100 [thread overview]
Message-ID: <aeaced5616f80e4e59f6cbdce77ab22365ce1901.camel@redhat.com> (raw)
In-Reply-To: <3f71a73a-8360-409c-8a51-55c37230d4cf@amd.com>
On Fri, 2025-12-05 at 10:19 +0100, Christian König wrote:
> On 12/4/25 17:04, Alex Deucher wrote:
> > On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <pstanner@redhat.com> wrote:
> > >
> > > +Cc Alex, Christian, Danilo
> > >
> > >
> > > On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> > > > Stop open coding pending job list in drivers. Add pending job list
> > > > iterator which safely walks DRM scheduler list asserting DRM scheduler
> > > > is stopped.
> > > >
> > > > v2:
> > > > - Fix checkpatch (CI)
> > > > v3:
> > > > - Drop locked version (Christian)
> > > > v4:
> > > > - Reorder patch (Niranjana)
> > >
> > > Same with the changelog.
> > >
> > > >
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > > ---
> > > > include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 50 insertions(+)
> > > >
> > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > index 385bf34e76fe..9d228513d06c 100644
> > > > --- a/include/drm/gpu_scheduler.h
> > > > +++ b/include/drm/gpu_scheduler.h
> > > > @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> > > > dma_fence_is_signaled(&s_fence->finished);
> > > > }
> > > >
> > > > +/**
> > > > + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> > > > + * @sched: DRM scheduler associated with pending job iterator
> > > > + */
> > > > +struct drm_sched_pending_job_iter {
> > > > + struct drm_gpu_scheduler *sched;
> > > > +};
> > > > +
> > > > +/* Drivers should never call this directly */
> > > > +static inline struct drm_sched_pending_job_iter
> > > > +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> > > > +{
> > > > + struct drm_sched_pending_job_iter iter = {
> > > > + .sched = sched,
> > > > + };
> > > > +
> > > > + WARN_ON(!drm_sched_is_stopped(sched));
> > > > + return iter;
> > > > +}
> > > > +
> > > > +/* Drivers should never call this directly */
> > > > +static inline void
> > > > +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> > > > +{
> > > > + WARN_ON(!drm_sched_is_stopped(iter.sched));
> > > > +}
> > > > +
> > > > +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> > > > + __drm_sched_pending_job_iter_end(_T),
> > > > + __drm_sched_pending_job_iter_begin(__sched),
> > > > + struct drm_gpu_scheduler *__sched);
> > > > +static inline void *
> > > > +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> > > > +{ return _T; }
> > > > +#define class_drm_sched_pending_job_iter_is_conditional false
> > > > +
> > > > +/**
> > > > + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> > > > + * @__job: Current pending job being iterated over
> > > > + * @__sched: DRM scheduler to iterate over pending jobs
> > > > + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> > > > + *
> > > > + * Iterator for each pending job in scheduler, filtering on an entity, and
> > > > + * enforcing scheduler is fully stopped
> > > > + */
> > > > +#define drm_sched_for_each_pending_job(__job, __sched, __entity) \
> > > > + scoped_guard(drm_sched_pending_job_iter, (__sched)) \
> > > > + list_for_each_entry((__job), &(__sched)->pending_list, list) \
> > > > + for_each_if(!(__entity) || (__job)->entity == (__entity))
> > > > +
> > > > #endif
> > >
> > >
> > > See my comments in the first patch. The docu doesn't mention at all why
> > > this new functionality exists and when and why users would be expected
> > > to use it.
> > >
> > > As far as I remember from XDC, both AMD and Intel overwrite a timed out
> > > jobs buffer data in the rings on GPU reset. To do so, the driver needs
> > > the timedout job (passed through timedout_job() callback) and then
> > > needs all the pending non-broken jobs.
> > >
> > > AFAICS your patch provides a generic iterator over the entire
> > > pending_list. How is a driver then supposed to determine which are the
> > > non-broken jobs (just asking, but that needs to be documented)?
> > >
> > > Could it make sense to use a different iterator which only returns jobs
> > > of not belonging to the same context as the timedout-one?
> > >
> > > Those are important questions that need to be addressed before merging
> > > that.
> > >
> > > And if this works canonically (i.e., for basically everyone), it needs
> > > to be documented in drm_sched_resubmit_jobs() that this iterator is now
> > > the canonical way of handling timeouts.
> > >
> > > Moreover, btw, just yesterday I added an entry to the DRM todo list
> > > which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
> > > would have to be removed, too.
> > >
> > >
> > > @AMD: Would the code Matthew provides work for you? Please give your
> > > input. This is very important common infrastructure.
> >
> > I don't think drm_sched_resubmit_jobs() can work for us without major
> > rework. For our kernel queues, we have a single queue on which jobs
> > for different clients are scheduled. When we reset the queue, we lose
> > all jobs on the queue and have to re-emit the non-guilty ones. We do
> > this at the ring level, i.e., we save the packets directly from the
> > ring and then re-emit the packets for the non-guilty contexts to the
> > freshly reset ring. This avoids running run_job() again which would
> > issue new fences and race with memory management, etc.
> >
> > I think the following would be workable:
> > 1. driver job_timedout() callback flags the job as bad. resets the bad
> > queue, and calls drm_sched_resubmit_jobs()
> > 2. drm_sched_resubmit_jobs() walks the pending list and calls
> > run_job() for every job
>
> Calling run_job() multiple times was one of the worst ideas I have ever seen.
>
> The problem here is that you need a transactional approach to the internal driver state which is modified by ->run_job().
>
> So for example if you have:
> ->run_job(A)
> ->run_job(B)
> ->run_job(C)
>
> And after a reset you find that you need to re-submit only job B and A & C are filtered then that means that your driver state needs to get back before running job A.
>
> > 2. driver run_job() callback looks to see if we already ran this job
> > and uses the original fence rather than allocating a new one
>
> Nope, the problem is *all* drivers *must* use the original fence. Otherwise you always run into trouble.
>
> We should not promote a driver interface which makes it extremely easy to shoot down the whole system.
>
> > 3. driver run_job() callback checks to see if the job is guilty or
> > from the same context and if so, sets an error on the fences and
> > submits only the fence packet to the queue so that any follow up jobs
> > will properly synchronize if they need to wait on the fence from the
> > bad job.
> > 4. driver run_job() callback will submit the full packet stream for
> > non-guilty contexts
> >
> > I guess we could use the iterator and implement that logic in the
> > driver directly rather than using drm_sched_resubmit_jobs().
>
> Yeah, exactly that's the way to go.
Sorry, I guess my message was confusing.
I don't mean anyone to use drm_sched_resubmit_jobs() at all. That
function is deprecated, and since there are still users, we can't
easily modify it.
I just mentioned that function because its docstring should inform
about what users should do *instead* of calling that function.
Currently, it's just marked as deprecated without providing users with
an alternative.
Anyways.
So can you take a look at Matthew's iterator and see if that will work
for you?
If we'd end up with an at least mostly-generic solution for resubmits
that's in use in both Xe and amdgpu, that would be a huge leap forward.
P.
>
> Christian.
>
> >
> > Alex
>
next prev parent reply other threads:[~2025-12-08 13:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
2025-12-03 8:56 ` Philipp Stanner
2025-12-03 21:10 ` Matthew Brost
2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
2025-12-03 9:07 ` Philipp Stanner
2025-12-03 10:28 ` Philipp Stanner
2025-12-04 16:04 ` Alex Deucher
2025-12-05 9:19 ` Christian König
2025-12-05 18:54 ` Matthew Brost
2025-12-08 13:33 ` Philipp Stanner [this message]
2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
2025-12-03 9:38 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
2025-12-03 10:56 ` Philipp Stanner
2025-12-03 20:44 ` Matthew Brost
2025-12-08 13:44 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
2025-12-02 6:42 ` Umesh Nerlige Ramappa
2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
2025-12-02 7:31 ` Umesh Nerlige Ramappa
2025-12-02 15:14 ` Matthew Brost
2025-12-03 1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-03 8:33 ` Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeaced5616f80e4e59f6cbdce77ab22365ce1901.camel@redhat.com \
--to=pstanner@redhat.com \
--cc=alexander.deucher@amd.com \
--cc=alexdeucher@gmail.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).