All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: robdclark@chromium.org, Sarah Walker <sarah.walker@imgtec.com>,
	airlied@linux.ie, lina@asahilina.net,
	Frank Binns <Frank.Binns@imgtec.com>,
	dri-devel@lists.freedesktop.org, christian.koenig@amd.com,
	Luben Tuikov <luben.tuikov@amd.com>,
	Donald Robson <Donald.Robson@imgtec.com>,
	intel-xe@lists.freedesktop.org, faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [RFC PATCH 06/10] drm/sched: Submit job before starting TDR
Date: Mon, 31 Jul 2023 09:26:45 +0200	[thread overview]
Message-ID: <20230731092645.4faf7048@collabora.com> (raw)
In-Reply-To: <ZMcHy4I/KNyZ2Q8k@DUT025-TGLU.fm.intel.com>

+the PVR devs

On Mon, 31 Jul 2023 01:00:59 +0000
Matthew Brost <matthew.brost@intel.com> wrote:

> On Thu, May 04, 2023 at 01:23:05AM -0400, Luben Tuikov wrote:
> > On 2023-04-03 20:22, Matthew Brost wrote:  
> > > If the TDR is set to a value, it can fire before a job is submitted in
> > > drm_sched_main. The job should be always be submitted before the TDR
> > > fires, fix this ordering.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 6ae710017024..4eac02d212c1 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1150,10 +1150,10 @@ static void drm_sched_main(struct work_struct *w)
> > >  		s_fence = sched_job->s_fence;
> > >  
> > >  		atomic_inc(&sched->hw_rq_count);
> > > -		drm_sched_job_begin(sched_job);
> > >  
> > >  		trace_drm_run_job(sched_job, entity);
> > >  		fence = sched->ops->run_job(sched_job);
> > > +		drm_sched_job_begin(sched_job);
> > >  		complete_all(&entity->entity_idle);
> > >  		drm_sched_fence_scheduled(s_fence);
> > >    
> > 
> > Not sure if this is correct. In drm_sched_job_begin() we add the job to the "pending_list"
> > (meaning it is pending execution in the hardware) and we also start a timeout timer. Both
> > of those should be started before the job is given to the hardware.
> >   
> 
> The correct solution is probably add to pending list before run_job()
> and kick TDR after run_job().

This would make the PVR driver simpler too. Right now, the driver
iterates over the pending job list to signal jobs done_fences, but
there's a race between the interrupt handler (that's iterating over
this list to signal fences) and the drm_sched logic (that's inserting
the job in the pending_list after run_job() returns). The race is taken
care of with an addition field that's pointing to the last submitted
job [1], but if we can get rid of that logic, that's for the best.

[1]https://gitlab.freedesktop.org/frankbinns/powervr/-/blob/powervr-next/drivers/gpu/drm/imagination/pvr_queue.h#L119

WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: robdclark@chromium.org, thomas.hellstrom@linux.intel.com,
	Sarah Walker <sarah.walker@imgtec.com>,
	airlied@linux.ie, lina@asahilina.net,
	dri-devel@lists.freedesktop.org, christian.koenig@amd.com,
	Luben Tuikov <luben.tuikov@amd.com>,
	Donald Robson <Donald.Robson@imgtec.com>,
	intel-xe@lists.freedesktop.org, faith.ekstrand@collabora.com
Subject: Re: [RFC PATCH 06/10] drm/sched: Submit job before starting TDR
Date: Mon, 31 Jul 2023 09:26:45 +0200	[thread overview]
Message-ID: <20230731092645.4faf7048@collabora.com> (raw)
In-Reply-To: <ZMcHy4I/KNyZ2Q8k@DUT025-TGLU.fm.intel.com>

+the PVR devs

On Mon, 31 Jul 2023 01:00:59 +0000
Matthew Brost <matthew.brost@intel.com> wrote:

> On Thu, May 04, 2023 at 01:23:05AM -0400, Luben Tuikov wrote:
> > On 2023-04-03 20:22, Matthew Brost wrote:  
> > > If the TDR is set to a value, it can fire before a job is submitted in
> > > drm_sched_main. The job should be always be submitted before the TDR
> > > fires, fix this ordering.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 6ae710017024..4eac02d212c1 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1150,10 +1150,10 @@ static void drm_sched_main(struct work_struct *w)
> > >  		s_fence = sched_job->s_fence;
> > >  
> > >  		atomic_inc(&sched->hw_rq_count);
> > > -		drm_sched_job_begin(sched_job);
> > >  
> > >  		trace_drm_run_job(sched_job, entity);
> > >  		fence = sched->ops->run_job(sched_job);
> > > +		drm_sched_job_begin(sched_job);
> > >  		complete_all(&entity->entity_idle);
> > >  		drm_sched_fence_scheduled(s_fence);
> > >    
> > 
> > Not sure if this is correct. In drm_sched_job_begin() we add the job to the "pending_list"
> > (meaning it is pending execution in the hardware) and we also start a timeout timer. Both
> > of those should be started before the job is given to the hardware.
> >   
> 
> The correct solution is probably add to pending list before run_job()
> and kick TDR after run_job().

This would make the PVR driver simpler too. Right now, the driver
iterates over the pending job list to signal jobs done_fences, but
there's a race between the interrupt handler (that's iterating over
this list to signal fences) and the drm_sched logic (that's inserting
the job in the pending_list after run_job() returns). The race is taken
care of with an addition field that's pointing to the last submitted
job [1], but if we can get rid of that logic, that's for the best.

[1]https://gitlab.freedesktop.org/frankbinns/powervr/-/blob/powervr-next/drivers/gpu/drm/imagination/pvr_queue.h#L119

  reply	other threads:[~2023-07-31  7:26 UTC|newest]

Thread overview: 176+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04  0:22 [Intel-xe] [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans Matthew Brost
2023-04-04  0:22 ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 01/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-06-09  6:58   ` [Intel-xe] " Boris Brezillon
2023-06-09  6:58     ` Boris Brezillon
2023-07-31  0:56     ` [Intel-xe] " Matthew Brost
2023-07-31  0:56       ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-04-04  7:24   ` kernel test robot
2023-04-05 17:37   ` [Intel-xe] " Luben Tuikov
2023-04-05 17:37     ` Luben Tuikov
2023-04-05 18:29     ` [Intel-xe] " Matthew Brost
2023-04-05 18:29       ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 03/10] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 04/10] drm/sched: Add generic scheduler message interface Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-05-04  5:28   ` [Intel-xe] " Luben Tuikov
2023-05-04  5:28     ` Luben Tuikov
2023-07-31  2:42     ` [Intel-xe] " Matthew Brost
2023-07-31  2:42       ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 05/10] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 06/10] drm/sched: Submit job before starting TDR Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-05-04  5:23   ` [Intel-xe] " Luben Tuikov
2023-05-04  5:23     ` Luben Tuikov
2023-07-31  1:00     ` [Intel-xe] " Matthew Brost
2023-07-31  1:00       ` Matthew Brost
2023-07-31  7:26       ` Boris Brezillon [this message]
2023-07-31  7:26         ` Boris Brezillon
2023-08-31 19:48         ` [Intel-xe] " Luben Tuikov
2023-08-31 19:48           ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 07/10] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-05-04  5:28   ` [Intel-xe] " Luben Tuikov
2023-05-04  5:28     ` Luben Tuikov
2023-07-31  1:09     ` [Intel-xe] " Matthew Brost
2023-07-31  1:09       ` Matthew Brost
2023-08-31 19:52       ` [Intel-xe] " Luben Tuikov
2023-08-31 19:52         ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-04-04  5:18   ` kernel test robot
2023-04-04  9:09   ` [Intel-xe] " Christian König
2023-04-04  9:09     ` Christian König
2023-04-04 12:54     ` [Intel-xe] " Thomas Hellström
2023-04-04 12:54       ` Thomas Hellström
2023-04-04 13:10       ` [Intel-xe] " Christian König
2023-04-04 13:10         ` Christian König
2023-04-04 18:14         ` [Intel-xe] " Thomas Hellström (Intel)
2023-04-04 18:14           ` Thomas Hellström (Intel)
2023-04-04 19:02           ` [Intel-xe] " Matthew Brost
2023-04-04 19:02             ` Matthew Brost
2023-04-04 19:25             ` [Intel-xe] " Daniel Vetter
2023-04-04 19:25               ` Daniel Vetter
2023-04-04 19:48               ` [Intel-xe] " Matthew Brost
2023-04-04 19:48                 ` Matthew Brost
2023-04-05 13:09                 ` [Intel-xe] " Daniel Vetter
2023-04-05 13:09                   ` Daniel Vetter
2023-04-05 23:58                   ` [Intel-xe] " Matthew Brost
2023-04-05 23:58                     ` Matthew Brost
2023-04-06  6:32                     ` [Intel-xe] " Daniel Vetter
2023-04-06  6:32                       ` Daniel Vetter
2023-04-06 16:58                       ` [Intel-xe] " Matthew Brost
2023-04-06 16:58                         ` Matthew Brost
2023-04-06 17:09                         ` [Intel-xe] " Daniel Vetter
2023-04-06 17:09                           ` Daniel Vetter
2023-04-05 12:35               ` [Intel-xe] " Thomas Hellström
2023-04-05 12:35                 ` Thomas Hellström
2023-04-05 12:39                 ` [Intel-xe] " Christian König
2023-04-05 12:39                   ` Christian König
2023-04-05 12:45                   ` [Intel-xe] " Daniel Vetter
2023-04-05 12:45                     ` Daniel Vetter
2023-04-05 14:08                     ` [Intel-xe] " Christian König
2023-04-05 14:08                       ` Christian König
2023-04-04 19:00         ` [Intel-xe] " Daniel Vetter
2023-04-04 19:00           ` Daniel Vetter
2023-04-04 20:03           ` [Intel-xe] " Matthew Brost
2023-04-04 20:03             ` Matthew Brost
2023-04-04 20:11             ` [Intel-xe] " Daniel Vetter
2023-04-04 20:11               ` Daniel Vetter
2023-04-04 20:19               ` [Intel-xe] " Matthew Brost
2023-04-04 20:19                 ` Matthew Brost
2023-04-04 20:31                 ` [Intel-xe] " Daniel Vetter
2023-04-04 20:31                   ` Daniel Vetter
2023-04-04 20:46                   ` [Intel-xe] " Matthew Brost
2023-04-04 20:46                     ` Matthew Brost
2023-04-04 14:45   ` kernel test robot
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 09/10] drm/sched: Support long-running sched entities Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 10/10] drm/syncobj: Warn on long running dma-fences Matthew Brost
2023-04-04  0:22   ` Matthew Brost
2023-04-04  0:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for Xe DRM scheduler and long running workload plans Patchwork
2023-04-04  1:07 ` [Intel-xe] [RFC PATCH 00/10] " Asahi Lina
2023-04-04  1:07   ` Asahi Lina
2023-04-04  1:58   ` [Intel-xe] " Matthew Brost
2023-04-04  1:58     ` Matthew Brost
2023-04-08  7:05     ` [Intel-xe] " Asahi Lina
2023-04-08  7:05       ` Asahi Lina
2023-04-11 14:07       ` [Intel-xe] " Daniel Vetter
2023-04-11 14:07         ` Daniel Vetter
2023-04-12  5:47         ` [Intel-xe] " Asahi Lina
2023-04-12  5:47           ` Asahi Lina
2023-04-12  8:18           ` [Intel-xe] " Daniel Vetter
2023-04-12  8:18             ` Daniel Vetter
2023-04-17  0:03       ` [Intel-xe] " Matthew Brost
2023-04-17  0:03         ` Matthew Brost
2023-04-04  9:04 ` [Intel-xe] " Christian König
2023-04-04  9:04   ` Christian König
2023-04-04 13:23   ` [Intel-xe] " Matthew Brost
2023-04-04 13:23     ` Matthew Brost
2023-04-04  9:13 ` [Intel-xe] " Christian König
2023-04-04  9:13   ` Christian König
2023-04-04 13:37   ` [Intel-xe] " Matthew Brost
2023-04-04 13:37     ` Matthew Brost
2023-04-05  7:41     ` [Intel-xe] " Christian König
2023-04-05  7:41       ` Christian König
2023-04-05  8:34       ` [Intel-xe] " Daniel Vetter
2023-04-05  8:34         ` Daniel Vetter
2023-04-05  8:53         ` [Intel-xe] " Christian König
2023-04-05  8:53           ` Christian König
2023-04-05  9:07           ` [Intel-xe] " Daniel Vetter
2023-04-05  9:07             ` Daniel Vetter
2023-04-05  9:57             ` [Intel-xe] " Christian König
2023-04-05  9:57               ` Christian König
2023-04-05 10:12               ` [Intel-xe] " Daniel Vetter
2023-04-05 10:12                 ` Daniel Vetter
2023-04-06  2:08                 ` [Intel-xe] " Matthew Brost
2023-04-06  2:08                   ` Matthew Brost
2023-04-06  6:37                   ` [Intel-xe] " Daniel Vetter
2023-04-06  6:37                     ` Daniel Vetter
2023-04-06 10:14                     ` [Intel-xe] " Christian König
2023-04-06 10:14                       ` Christian König
2023-04-06 10:32                       ` [Intel-xe] " Daniel Vetter
2023-04-06 10:32                         ` Daniel Vetter
2023-04-04  9:43 ` [Intel-xe] " Tvrtko Ursulin
2023-04-04  9:43   ` Tvrtko Ursulin
2023-04-04  9:48   ` [Intel-xe] " Christian König
2023-04-04  9:48     ` Christian König
2023-04-04 13:43     ` [Intel-xe] " Matthew Brost
2023-04-04 13:43       ` Matthew Brost
2023-04-04 13:52   ` [Intel-xe] " Matthew Brost
2023-04-04 13:52     ` Matthew Brost
2023-04-04 17:29     ` [Intel-xe] " Tvrtko Ursulin
2023-04-04 17:29       ` Tvrtko Ursulin
2023-04-04 19:07       ` [Intel-xe] " Daniel Vetter
2023-04-04 19:07         ` Daniel Vetter
2023-04-04 18:02 ` [Intel-xe] " Zeng, Oak
2023-04-04 18:02   ` Zeng, Oak
2023-04-04 18:08   ` [Intel-xe] " Matthew Brost
2023-04-04 18:08     ` Matthew Brost
2023-04-05  7:30     ` [Intel-xe] " Christian König
2023-04-05  7:30       ` Christian König
2023-04-05  8:42       ` [Intel-xe] " Daniel Vetter
2023-04-05  8:42         ` Daniel Vetter
2023-04-05 18:06       ` [Intel-xe] " Zeng, Oak
2023-04-05 18:06         ` Zeng, Oak
2023-04-05 18:53         ` [Intel-xe] " Matthew Brost
2023-04-05 18:53           ` Matthew Brost
2023-04-06 10:04           ` [Intel-xe] " Christian König
2023-04-06 10:04             ` Christian König
2023-04-07  0:20           ` [Intel-xe] " Zeng, Oak
2023-04-07  0:20             ` Zeng, Oak
2023-04-11  9:02             ` [Intel-xe] " Christian König
2023-04-11  9:02               ` Christian König
2023-04-11 14:13               ` [Intel-xe] " Daniel Vetter
2023-04-11 14:13                 ` Daniel Vetter
2023-04-17  6:47                 ` [Intel-xe] " Christian König
2023-04-17  6:47                   ` Christian König
2023-04-17  8:39                   ` [Intel-xe] " Daniel Vetter
2023-04-17  8:39                     ` Daniel Vetter
2023-04-18 15:10 ` [Intel-xe] " Liviu Dudau
2023-04-18 15:10   ` Liviu Dudau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230731092645.4faf7048@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=Donald.Robson@imgtec.com \
    --cc=Frank.Binns@imgtec.com \
    --cc=airlied@linux.ie \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lina@asahilina.net \
    --cc=luben.tuikov@amd.com \
    --cc=matthew.brost@intel.com \
    --cc=robdclark@chromium.org \
    --cc=sarah.walker@imgtec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.