Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Luben Tuikov <luben.tuikov@amd.com>
Cc: robdclark@chromium.org, sarah.walker@imgtec.com,
	ketil.johnsen@arm.com, Liviu.Dudau@arm.com, mcanal@igalia.com,
	frank.binns@imgtec.com, dri-devel@lists.freedesktop.org,
	christian.koenig@amd.com, boris.brezillon@collabora.com,
	donald.robson@imgtec.com, daniel@ffwll.ch, lina@asahilina.net,
	airlied@gmail.com, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR
Date: Thu, 14 Sep 2023 17:48:22 +0000	[thread overview]
Message-ID: <ZQNHZm4HPeyPBPMQ@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <ed3aca10-8a9f-4698-92f4-21558fa6cfe3@amd.com>

On Wed, Sep 13, 2023 at 10:56:10PM -0400, Luben Tuikov wrote:
> On 2023-09-11 22:16, Matthew Brost wrote:
> > If the TDR is set to a value, it can fire before a job is submitted in
> > drm_sched_main. The job should be always be submitted before the TDR
> > fires, fix this ordering.
> > 
> > v2:
> >   - Add to pending list before run_job, start TDR after (Luben, Boris)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index c627d3e6494a..9dbfab7be2c6 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
> >  
> >  	spin_lock(&sched->job_list_lock);
> >  	list_add_tail(&s_job->list, &sched->pending_list);
> > -	drm_sched_start_timeout(sched);
> >  	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> > @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
> >  		fence = sched->ops->run_job(sched_job);
> >  		complete_all(&entity->entity_idle);
> >  		drm_sched_fence_scheduled(s_fence, fence);
> > +		drm_sched_start_timeout_unlocked(sched);
> >  
> >  		if (!IS_ERR_OR_NULL(fence)) {
> >  			/* Drop for original kref_init of the fence */
> 
> So, sched->ops->run_job(), is a "job inflection point" from the point of view of
> the DRM scheduler. After that call, DRM has relinquished control of the job to the
> firmware/hardware.
> 
> Putting the job in the pending list, before submitting it to down to the firmware/hardware,
> goes along with starting a timeout timer for the job. The timeout always includes
> time for the firmware/hardware to get it prepped, as well as time for the actual
> execution of the job (task). Thus, we want to do this:
> 	1. Put the job in pending list. "Pending list" means "pends in hardware".
> 	2. Start a timeout timer for the job.
> 	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
> 	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
> 	   by calling sched->ops->run_job().
> Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
> the timeout value?

I disagree. It is clear race if the timeout starts before run_job() that
the TDR can fire before run_job() is called. The entire point of this
patch is to seal this race by starting the TDR after run_job() is
called.

Matt

> -- 
> Regards,
> Luben
> 

  reply	other threads:[~2023-09-14 17:50 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12  2:16 [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 01/13] drm/sched: Add drm_sched_submit_* helpers Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-09-12  7:29   ` Boris Brezillon
2023-09-12 15:02     ` Matthew Brost
2023-09-14  3:41       ` Luben Tuikov
2023-09-14  3:35   ` Luben Tuikov
2023-09-16 17:07   ` Danilo Krummrich
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-09-12  7:37   ` Boris Brezillon
2023-09-12 15:14     ` Matthew Brost
2023-09-12 14:11   ` kernel test robot
2023-09-12 15:17     ` Matthew Brost
2023-09-14  4:18   ` Luben Tuikov
2023-09-14  4:23     ` Luben Tuikov
2023-09-14 15:48       ` Matthew Brost
2023-09-14 15:49     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 04/13] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-09-13 12:30   ` kernel test robot
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 05/13] drm/sched: Split free_job into own work item Matthew Brost
2023-09-12  8:08   ` Boris Brezillon
2023-09-12 14:37     ` Matthew Brost
2023-09-12 14:53       ` Boris Brezillon
2023-09-12 14:55         ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 06/13] drm/sched: Add generic scheduler message interface Matthew Brost
2023-09-12  8:23   ` Boris Brezillon
2023-09-12 14:50     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 07/13] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 08/13] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 09/13] drm/sched: Submit job before starting TDR Matthew Brost
2023-09-14  2:56   ` Luben Tuikov
2023-09-14 17:48     ` Matthew Brost [this message]
2023-09-21  3:35       ` Luben Tuikov
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 10/13] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-09-14  2:38   ` Luben Tuikov
2023-09-14 17:36     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill Matthew Brost
2023-09-12  8:44   ` Boris Brezillon
2023-09-12  9:57   ` Christian König
2023-09-12 14:47     ` Matthew Brost
2023-09-16 17:52       ` Danilo Krummrich
2023-09-18 11:03         ` Christian König
2023-09-18 14:57           ` Danilo Krummrich
2023-09-19  5:55             ` Christian König
2023-09-12 10:28   ` Boris Brezillon
2023-09-12 14:54     ` Matthew Brost
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion Matthew Brost
2023-09-13 15:04   ` Christian König
2023-09-14  2:06   ` Luben Tuikov
2023-09-16 18:06   ` Danilo Krummrich
2023-09-12  2:16 ` [Intel-xe] [PATCH v3 13/13] drm/sched: Update maintainers of GPU scheduler Matthew Brost
2023-09-12  2:20 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev5) Patchwork
2023-09-14  1:45 ` [Intel-xe] [PATCH v3 00/13] DRM scheduler changes for Xe Luben Tuikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQNHZm4HPeyPBPMQ@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Liviu.Dudau@arm.com \
    --cc=airlied@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=donald.robson@imgtec.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=frank.binns@imgtec.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=ketil.johnsen@arm.com \
    --cc=lina@asahilina.net \
    --cc=luben.tuikov@amd.com \
    --cc=mcanal@igalia.com \
    --cc=robdclark@chromium.org \
    --cc=sarah.walker@imgtec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox