From: Philipp Stanner <pstanner@redhat.com>
To: Matthew Brost <matthew.brost@intel.com>, intel-xe@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org, dakr@kernel.org,
"Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>
Subject: Re: [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state
Date: Wed, 03 Dec 2025 09:56:45 +0100 [thread overview]
Message-ID: <0b1b4bbf0b49832db2c1e12477c5af55780f39df.camel@redhat.com> (raw)
In-Reply-To: <20251201183954.852637-2-matthew.brost@intel.com>
+Cc Christian, Alex, Danilo
On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote:
> Add helpers to see if scheduler is stopped and a jobs signaled state.
> Expected to be used driver side on recovery and debug flows.
First, thanks for working on this.
This is a big and significant change because it moves towards ending
the 10-year practice of accessing internal locks etc. – I think this
should have a long(er) and detailed commit message aka "In the past
drivers used to … this must end because … to do so we need to provide
those new functions: …"
>
> v4:
> - Reorder patch to first in series (Niranjana)
> - Also check parent fence for signaling (Niranjana)
"We" mostly agreed of not adding changelogs to commit messages anymore
and either have them in the cover letter or in the patche's comment
section below ---
The commit changelog comments are not canonical in the kernel and don't
provide any value IMO.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
> include/drm/gpu_scheduler.h | 32 ++++++++++++++++++++++++++
> 2 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 1d4f1b822e7b..cf40c18ab433 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
> */
> static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> {
> - if (!READ_ONCE(sched->pause_submit))
> + if (!drm_sched_is_stopped(sched))
> queue_work(sched->submit_wq, &sched->work_run_job);
> }
>
> @@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> */
> static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
> {
> - if (!READ_ONCE(sched->pause_submit))
> + if (!drm_sched_is_stopped(sched))
> queue_work(sched->submit_wq, &sched->work_free_job);
> }
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index fb88301b3c45..385bf34e76fe 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -698,4 +698,36 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> struct drm_gpu_scheduler **sched_list,
> unsigned int num_sched_list);
>
> +/* Inlines */
Surplus comment, everyone immediately sees by the keyword that the
functions are inline.
But why do you want to provide them here instead of in sched_main.c in
the first place?
> +
> +/**
> + * drm_sched_is_stopped() - DRM is stopped
Well no, I doubt the entire DRM subsystem is stopped ;)
"Checks whether drm_sched is stopped"
> + * @sched: DRM scheduler
> + *
> + * Return: True if sched is stopped, False otherwise
> + */
> +static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
> +{
> + return READ_ONCE(sched->pause_submit);
I am by the way suspecting since a long time
> +}
> +
> +/**
> + * drm_sched_job_is_signaled() - DRM scheduler job is signaled
> + * @job: DRM scheduler job
> + *
> + * Determine if DRM scheduler job is signaled. DRM scheduler should be stopped
> + * to obtain a stable snapshot of state. Both parent fence (hardware fence) and
> + * finished fence (software fence) are check to determine signaling state.
s/check/checked
I can roughly understand why you need the start/stop checkers for your
list iterator, but what is this function's purpose? The commit message
should explain that.
Do you need them in Xe? Do all drivers need them?
I think it's very cool that you provide this series and are working on
all that, but at XDC I think the important point was that we determined
that AMD and Intel basically do the same trick for GPU resets.
So our desire was not only to prevent folks from accessing the
scheduler's internals, but, ideally, also provide a well documented,
centralized and canonical mechanisms to do GPU resets.
So I think this drm/sched code must be discussed with AMD and we should
see whether it would be sufficient for them, too. And if yes, we need
to properly document that new way of GPU resets and tell users what
those functions are for. The docstrings so far just highlight that
those functions exist and how they are used, but not *why* they exist.
> + *
> + * Return: True if job is signaled, False otherwise
True and False should be lower case I think. At least I've never seen
them upper case in docstrings so far?
P.
> + */
> +static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> +{
> + struct drm_sched_fence *s_fence = job->s_fence;
> +
> + WARN_ON(!drm_sched_is_stopped(job->sched));
> + return (s_fence->parent && dma_fence_is_signaled(s_fence->parent)) ||
> + dma_fence_is_signaled(&s_fence->finished);
> +}
> +
> #endif
next prev parent reply other threads:[~2025-12-03 8:57 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-01 18:39 [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-01 18:39 ` [PATCH v7 1/9] drm/sched: Add several job helpers to avoid drivers touching scheduler state Matthew Brost
2025-12-03 8:56 ` Philipp Stanner [this message]
2025-12-03 21:10 ` Matthew Brost
2025-12-01 18:39 ` [PATCH v7 2/9] drm/sched: Add pending job list iterator Matthew Brost
2025-12-03 9:07 ` Philipp Stanner
2025-12-03 10:28 ` Philipp Stanner
2025-12-04 16:04 ` Alex Deucher
2025-12-05 9:19 ` Christian König
2025-12-05 18:54 ` Matthew Brost
2025-12-08 13:33 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 3/9] drm/xe: Add dedicated message lock Matthew Brost
2025-12-03 9:38 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 4/9] drm/xe: Stop abusing DRM scheduler internals Matthew Brost
2025-12-03 10:56 ` Philipp Stanner
2025-12-03 20:44 ` Matthew Brost
2025-12-08 13:44 ` Philipp Stanner
2025-12-01 18:39 ` [PATCH v7 5/9] drm/xe: Only toggle scheduling in TDR if GuC is running Matthew Brost
2025-12-01 18:39 ` [PATCH v7 6/9] drm/xe: Do not deregister queues in TDR Matthew Brost
2025-12-01 18:39 ` [PATCH v7 7/9] drm/xe: Remove special casing for LR queues in submission Matthew Brost
2025-12-01 18:39 ` [PATCH v7 8/9] drm/xe: Disable timestamp WA on VFs Matthew Brost
2025-12-02 6:42 ` Umesh Nerlige Ramappa
2025-12-01 18:39 ` [PATCH v7 9/9] drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR Matthew Brost
2025-12-02 7:31 ` Umesh Nerlige Ramappa
2025-12-02 15:14 ` Matthew Brost
2025-12-03 1:23 ` [PATCH v7 0/9] Fix DRM scheduler layering violations in Xe Matthew Brost
2025-12-03 8:33 ` Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b1b4bbf0b49832db2c1e12477c5af55780f39df.camel@redhat.com \
--to=pstanner@redhat.com \
--cc=alexander.deucher@amd.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).