From: Philipp Stanner <pstanner@redhat.com>
To: Matthew Brost <matthew.brost@intel.com>, intel-xe@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
dakr@kernel.org
Subject: Re: [PATCH v7 2/9] drm/sched: Add pending job list iterator
Date: Wed, 03 Dec 2025 10:07:37 +0100 [thread overview]
Message-ID: <0088fe0dd0d62b876d77b0f9e3a1c7586bdc5557.camel@redhat.com> (raw)
In-Reply-To: <20251201183954.852637-3-matthew.brost@intel.com>
+Cc Alex, Christian, Danilo
On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> Stop open coding pending job list in drivers. Add pending job list
> iterator which safely walks DRM scheduler list asserting DRM scheduler
> is stopped.
>
> v2:
> - Fix checkpatch (CI)
> v3:
> - Drop locked version (Christian)
> v4:
> - Reorder patch (Niranjana)
Same with the changelog.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
> include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 385bf34e76fe..9d228513d06c 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -730,4 +730,54 @@ static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> dma_fence_is_signaled(&s_fence->finished);
> }
>
> +/**
> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator state
> + * @sched: DRM scheduler associated with pending job iterator
> + */
> +struct drm_sched_pending_job_iter {
> + struct drm_gpu_scheduler *sched;
> +};
> +
> +/* Drivers should never call this directly */
> +static inline struct drm_sched_pending_job_iter
> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> +{
> + struct drm_sched_pending_job_iter iter = {
> + .sched = sched,
> + };
> +
> + WARN_ON(!drm_sched_is_stopped(sched));
> + return iter;
> +}
> +
> +/* Drivers should never call this directly */
> +static inline void
> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter iter)
> +{
> + WARN_ON(!drm_sched_is_stopped(iter.sched));
> +}
> +
> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> + __drm_sched_pending_job_iter_end(_T),
> + __drm_sched_pending_job_iter_begin(__sched),
> + struct drm_gpu_scheduler *__sched);
> +static inline void *
> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t *_T)
> +{ return _T; }
> +#define class_drm_sched_pending_job_iter_is_conditional false
> +
> +/**
> + * drm_sched_for_each_pending_job() - Iterator for each pending job in scheduler
> + * @__job: Current pending job being iterated over
> + * @__sched: DRM scheduler to iterate over pending jobs
> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> + *
> + * Iterator for each pending job in scheduler, filtering on an entity, and
> + * enforcing scheduler is fully stopped
> + */
> +#define drm_sched_for_each_pending_job(__job, __sched, __entity) \
> + scoped_guard(drm_sched_pending_job_iter, (__sched)) \
> + list_for_each_entry((__job), &(__sched)->pending_list, list) \
> + for_each_if(!(__entity) || (__job)->entity == (__entity))
> +
> #endif
See my comments in the first patch. The docu doesn't mention at all why
this new functionality exists and when and why users would be expected
to use it.
As far as I remember from XDC, both AMD and Intel overwrite a timed out
jobs buffer data in the rings on GPU reset. To do so, the driver needs
the timedout job (passed through timedout_job() callback) and then
needs all the pending non-broken jobs.
AFAICS your patch provides a generic iterator over the entire
pending_list. How is a driver then supposed to determine which are the
non-broken jobs (just asking, but that needs to be documented)?
Could it make sense to use a different iterator which only returns jobs
of not belonging to the same context as the timedout-one?
Those are important questions that need to be addressed before merging
that.
And if this works canonically (i.e., for basically everyone), it needs
to be documented in drm_sched_resubmit_jobs() that this iterator is now
the canonical way of handling timeouts.
Moreover, btw, just yesterday I added an entry to the DRM todo list
which addresses drm_sched_resubmit_jobs(). If we merge this, that entry
would have to be removed, too.
@AMD: Would the code Matthew provides work for you? Please give your
input. This is very important common infrastructure.
Philipp
next prev parent reply other threads:[~2025-12-03 9:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
2025-12-03 8:56 ` Philipp Stanner
2025-12-03 21:10 ` Matthew Brost
2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
2025-12-03 9:07 ` Philipp Stanner [this message]
2025-12-03 10:28 ` Philipp Stanner
2025-12-04 16:04 ` Alex Deucher
2025-12-05 9:19 ` Christian König
2025-12-05 18:54 ` Matthew Brost
2025-12-08 13:33 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
2025-12-03 9:38 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
2025-12-03 10:56 ` Philipp Stanner
2025-12-03 20:44 ` Matthew Brost
2025-12-08 13:44 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
2025-12-02 6:42 ` Umesh Nerlige Ramappa
2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
2025-12-02 7:31 ` Umesh Nerlige Ramappa
2025-12-02 15:14 ` Matthew Brost
2025-12-03 1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-03 8:33 ` Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0088fe0dd0d62b876d77b0f9e3a1c7586bdc5557.camel@redhat.com \
--to=pstanner@redhat.com \
--cc=alexander.deucher@amd.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).