Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
To: phasta@kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Maíra Canal" <mcanal@igalia.com>
Subject: Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout
Date: Wed, 14 May 2025 09:57:04 +0100	[thread overview]
Message-ID: <eae2623f-65db-42db-9c6e-acc76bd50423@igalia.com> (raw)
In-Reply-To: <207366049668e3df24ac81cd9f2e07bc1a2358ad.camel@mailbox.org>


On 12/05/2025 13:53, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>> Reduce to one spin_unlock for hopefully a little bit clearer flow in
>> the
>> function. It may appear that there is a behavioural change with the
>> drm_sched_start_timeout_unlocked() now not being called if there were
>> initially no jobs on the pending list, and then some appeared after
>> unlock, however if the code would rely on the TDR handler restarting
>> itself then it would fail to do that if the job arrived on the
>> pending
>> list after the check.
>>
>> Also fix one stale comment while touching the function.
> 
> Same here, that's a good candidate for a separate patch / series.

It conflicts with the in progress work from Maíra (fixing memory leaks 
on false timeouts) so I will keep this one on the back-burner until her 
work lands.

Regards,

Tvrtko

>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-----------
>> --
>>   1 file changed, 18 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index a45b02fd2af3..a26cc11c8ade 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct
>> drm_sched_job *s_job)
>>   
>>   static void drm_sched_job_timedout(struct work_struct *work)
>>   {
>> -	struct drm_gpu_scheduler *sched;
>> +	struct drm_gpu_scheduler *sched =
>> +		container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>> +	enum drm_gpu_sched_stat status;
>>   	struct drm_sched_job *job;
>> -	enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
>> -
>> -	sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>   
>>   	/* Protects against concurrent deletion in
>> drm_sched_get_finished_job */
>>   	spin_lock(&sched->job_list_lock);
>>   	job = list_first_entry_or_null(&sched->pending_list,
>>   				       struct drm_sched_job, list);
>> -
>>   	if (job) {
>>   		/*
>>   		 * Remove the bad job so it cannot be freed by
>> concurrent
>> -		 * drm_sched_cleanup_jobs. It will be reinserted
>> back after sched->thread
>> -		 * is parked at which point it's safe.
>> +		 * drm_sched_get_finished_job. It will be reinserted
>> back after
>> +		 * scheduler worker is stopped at which point it's
>> safe.
>>   		 */
>>   		list_del_init(&job->list);
>> -		spin_unlock(&sched->job_list_lock);
>> +	}
>> +	spin_unlock(&sched->job_list_lock);
>>   
>> -		status = job->sched->ops->timedout_job(job);
>> +	if (!job)
>> +		return;
>>   
>> -		/*
>> -		 * Guilty job did complete and hence needs to be
>> manually removed
>> -		 * See drm_sched_stop doc.
>> -		 */
>> -		if (sched->free_guilty) {
>> -			job->sched->ops->free_job(job);
>> -			sched->free_guilty = false;
>> -		}
>> -	} else {
>> -		spin_unlock(&sched->job_list_lock);
>> +	status = job->sched->ops->timedout_job(job);
>> +
>> +	/*
>> +	 * Guilty job did complete and hence needs to be manually
>> removed. See
>> +	 * documentation for drm_sched_stop.
>> +	 */
>> +	if (sched->free_guilty) {
>> +		job->sched->ops->free_job(job);
>> +		sched->free_guilty = false;
>>   	}
>>   
>>   	if (status != DRM_GPU_SCHED_STAT_ENODEV)
>

next prev parent reply	other threads:[~2025-05-14  8:57 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-04-29 15:03   ` Christian König
2025-04-29 15:45     ` Michel Dänzer
2025-04-29 15:52       ` Christian König
2025-04-25 10:20 ` [RFC v4 02/16] drm/sched: Add some more " Tvrtko Ursulin
2025-04-29 15:07   ` Christian König
2025-04-25 10:20 ` [RFC v4 03/16] drm/sched: De-clutter drm_sched_init Tvrtko Ursulin
2025-04-29 15:16   ` Christian König
2025-04-25 10:20 ` [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
2025-05-12 12:49   ` Philipp Stanner
2025-05-12 12:57     ` Matthew Brost
2025-05-14  8:54       ` Tvrtko Ursulin
2025-05-14  8:46     ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
2025-05-12 12:53   ` Philipp Stanner
2025-05-14  8:57     ` Tvrtko Ursulin [this message]
2025-04-25 10:20 ` [RFC v4 06/16] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 07/16] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 08/16] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 09/16] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 10/16] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-05-12 12:56   ` Philipp Stanner
2025-05-14  9:00     ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 11/16] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 12/16] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-05-12 13:03   ` Philipp Stanner
2025-05-14  9:22     ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 13/16] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 14/16] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 15/16] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-05-12 13:05   ` Philipp Stanner
2025-04-29  7:25 ` [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-05-19 16:51 ` Pierre-Eric Pelloux-Prayer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eae2623f-65db-42db-9c6e-acc76bd50423@igalia.com \
    --to=tvrtko.ursulin@igalia.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=matthew.brost@intel.com \
    --cc=mcanal@igalia.com \
    --cc=phasta@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).