Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: robdclark@chromium.org, airlied@linux.ie, lina@asahilina.net,
	dri-devel@lists.freedesktop.org, christian.koenig@amd.com,
	boris.brezillon@collabora.com, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans
Date: Tue, 4 Apr 2023 13:52:02 +0000	[thread overview]
Message-ID: <ZCwrgvAHGvdTCe7K@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <15dafc15-332e-2559-a9c4-61ad442ef44a@linux.intel.com>

On Tue, Apr 04, 2023 at 10:43:03AM +0100, Tvrtko Ursulin wrote:
> 
> On 04/04/2023 01:22, Matthew Brost wrote:
> > Hello,
> > 
> > As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> > have been asked to merge our common DRM scheduler patches first as well
> > as develop a common solution for long running workloads with the DRM
> > scheduler. This RFC series is our first attempt at doing this. We
> > welcome any and all feedback.
> > 
> > This can we thought of as 4 parts detailed below.
> > 
> > - DRM scheduler changes for 1 to 1 relationship between scheduler and
> > entity (patches 1-3)
> > 
> > In Xe all of the scheduling of jobs is done by a firmware scheduler (the
> > GuC) which is a new paradigm WRT to the DRM scheduler and presents
> > severals problems as the DRM was originally designed to schedule jobs on
> > hardware queues. The main problem being that DRM scheduler expects the
> > submission order of jobs to be the completion order of jobs even across
> > multiple entities. This assumption falls apart with a firmware scheduler
> > as a firmware scheduler has no concept of jobs and jobs can complete out
> > of order. A novel solution for was originally thought of by Faith during
> > the initial prototype of Xe, create a 1 to 1 relationship between scheduler
> > and entity. I believe the AGX driver [3] is using this approach and
> > Boris may use approach as well for the Mali driver [4].
> > 
> > To support a 1 to 1 relationship we move the main execution function
> > from a kthread to a work queue and add a new scheduling mode which
> > bypasses code in the DRM which isn't needed in a 1 to 1 relationship.
> > The new scheduling mode should unify all drivers usage with a 1 to 1
> > relationship and can be thought of as using scheduler as a dependency /
> > infligt job tracker rather than a true scheduler.
> 
> Once you add capability for a more proper 1:1 via
> DRM_SCHED_POLICY_SINGLE_ENTITY, do you still have further need to replace
> kthreads with a wq?
> 
> Or in other words, what purpose does the offloading of a job picking code to
> a separate execution context serve? Could it be done directly in the 1:1
> mode and leave kthread setup for N:M?
> 

Addressed the other two on my reply to Christian...

For this one basically the concept of a single entity point IMO is a
very good concept which I'd like to keep. But most important reason
being the main execution thread (now worker) is kicked when a dependency
for a job is resolved, dependencies are dma-fences signaled via a
callback, and these call backs can be signaled in IRQ contexts. We
absolutely do not want to enter the backend in an IRQ context for a
variety of reasons.

Matt

> Apart from those design level questions, low level open IMO still is that
> default fallback of using the system_wq has the potential to affect latency
> for other drivers. But that's for those driver owners to approve.
> 
> Regards,
> 
> Tvrtko
> 
> > - Generic messaging interface for DRM scheduler
> > 
> > Idea is to be able to communicate to the submission backend with in band
> > (relative to main execution function) messages. Messages are backend
> > defined and flexable enough for any use case. In Xe we use these
> > messages to clean up entites, set properties for entites, and suspend /
> > resume execution of an entity [5]. I suspect other driver can leverage
> > this messaging concept too as it a convenient way to avoid races in the
> > backend.
> > 
> > - Support for using TDR for all error paths of a scheduler / entity
> > 
> > Fix a few races / bugs, add function to dynamically set the TDR timeout.
> > 
> > - Annotate dma-fences for long running workloads.
> > 
> > The idea here is to use dma-fences only as sync points within the
> > scheduler and never export them for long running workloads. By
> > annotating these fences as long running we ensure that these dma-fences
> > are never used in a way that breaks the dma-fence rules. A benefit of
> > thus approach is the scheduler can still safely flow control the
> > execution ring buffer via the job limit without breaking the dma-fence
> > rules.
> > 
> > Again this a first draft and looking forward to feedback.
> > 
> > Enjoy - Matt
> > 
> > [1] https://gitlab.freedesktop.org/drm/xe/kernel
> > [2] https://patchwork.freedesktop.org/series/112188/
> > [3] https://patchwork.freedesktop.org/series/114772/
> > [4] https://patchwork.freedesktop.org/patch/515854/?series=112188&rev=1
> > [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1031
> > 
> > Matthew Brost (8):
> >    drm/sched: Convert drm scheduler to use a work queue rather than
> >      kthread
> >    drm/sched: Move schedule policy to scheduler / entity
> >    drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
> >    drm/sched: Add generic scheduler message interface
> >    drm/sched: Start run wq before TDR in drm_sched_start
> >    drm/sched: Submit job before starting TDR
> >    drm/sched: Add helper to set TDR timeout
> >    drm/syncobj: Warn on long running dma-fences
> > 
> > Thomas Hellström (2):
> >    dma-buf/dma-fence: Introduce long-running completion fences
> >    drm/sched: Support long-running sched entities
> > 
> >   drivers/dma-buf/dma-fence.c                 | 142 +++++++---
> >   drivers/dma-buf/dma-resv.c                  |   5 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  15 +-
> >   drivers/gpu/drm/drm_syncobj.c               |   5 +-
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c     |   5 +-
> >   drivers/gpu/drm/lima/lima_sched.c           |   5 +-
> >   drivers/gpu/drm/msm/adreno/adreno_device.c  |   6 +-
> >   drivers/gpu/drm/msm/msm_ringbuffer.c        |   5 +-
> >   drivers/gpu/drm/panfrost/panfrost_job.c     |   5 +-
> >   drivers/gpu/drm/scheduler/sched_entity.c    | 127 +++++++--
> >   drivers/gpu/drm/scheduler/sched_fence.c     |   6 +-
> >   drivers/gpu/drm/scheduler/sched_main.c      | 278 +++++++++++++++-----
> >   drivers/gpu/drm/v3d/v3d_sched.c             |  25 +-
> >   include/drm/gpu_scheduler.h                 | 130 +++++++--
> >   include/linux/dma-fence.h                   |  60 ++++-
> >   16 files changed, 649 insertions(+), 184 deletions(-)
> > 

  parent reply	other threads:[~2023-04-04 13:52 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04  0:22 [Intel-xe] [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 01/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-06-09  6:58   ` Boris Brezillon
2023-07-31  0:56     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-04-05 17:37   ` Luben Tuikov
2023-04-05 18:29     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 03/10] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 04/10] drm/sched: Add generic scheduler message interface Matthew Brost
2023-05-04  5:28   ` Luben Tuikov
2023-07-31  2:42     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 05/10] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 06/10] drm/sched: Submit job before starting TDR Matthew Brost
2023-05-04  5:23   ` Luben Tuikov
2023-07-31  1:00     ` Matthew Brost
2023-07-31  7:26       ` Boris Brezillon
2023-08-31 19:48         ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 07/10] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-05-04  5:28   ` Luben Tuikov
2023-07-31  1:09     ` Matthew Brost
2023-08-31 19:52       ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences Matthew Brost
2023-04-04  9:09   ` Christian König
2023-04-04 12:54     ` Thomas Hellström
2023-04-04 13:10       ` Christian König
2023-04-04 18:14         ` Thomas Hellström (Intel)
2023-04-04 19:02           ` Matthew Brost
2023-04-04 19:25             ` Daniel Vetter
2023-04-04 19:48               ` Matthew Brost
2023-04-05 13:09                 ` Daniel Vetter
2023-04-05 23:58                   ` Matthew Brost
2023-04-06  6:32                     ` Daniel Vetter
2023-04-06 16:58                       ` Matthew Brost
2023-04-06 17:09                         ` Daniel Vetter
2023-04-05 12:35               ` Thomas Hellström
2023-04-05 12:39                 ` Christian König
2023-04-05 12:45                   ` Daniel Vetter
2023-04-05 14:08                     ` Christian König
2023-04-04 19:00         ` Daniel Vetter
2023-04-04 20:03           ` Matthew Brost
2023-04-04 20:11             ` Daniel Vetter
2023-04-04 20:19               ` Matthew Brost
2023-04-04 20:31                 ` Daniel Vetter
2023-04-04 20:46                   ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 09/10] drm/sched: Support long-running sched entities Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 10/10] drm/syncobj: Warn on long running dma-fences Matthew Brost
2023-04-04  0:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for Xe DRM scheduler and long running workload plans Patchwork
2023-04-04  1:07 ` [Intel-xe] [RFC PATCH 00/10] " Asahi Lina
2023-04-04  1:58   ` Matthew Brost
2023-04-08  7:05     ` Asahi Lina
2023-04-11 14:07       ` Daniel Vetter
2023-04-12  5:47         ` Asahi Lina
2023-04-12  8:18           ` Daniel Vetter
2023-04-17  0:03       ` Matthew Brost
2023-04-04  9:04 ` Christian König
2023-04-04 13:23   ` Matthew Brost
2023-04-04  9:13 ` Christian König
2023-04-04 13:37   ` Matthew Brost
2023-04-05  7:41     ` Christian König
2023-04-05  8:34       ` Daniel Vetter
2023-04-05  8:53         ` Christian König
2023-04-05  9:07           ` Daniel Vetter
2023-04-05  9:57             ` Christian König
2023-04-05 10:12               ` Daniel Vetter
2023-04-06  2:08                 ` Matthew Brost
2023-04-06  6:37                   ` Daniel Vetter
2023-04-06 10:14                     ` Christian König
2023-04-06 10:32                       ` Daniel Vetter
2023-04-04  9:43 ` Tvrtko Ursulin
2023-04-04  9:48   ` Christian König
2023-04-04 13:43     ` Matthew Brost
2023-04-04 13:52   ` Matthew Brost [this message]
2023-04-04 17:29     ` Tvrtko Ursulin
2023-04-04 19:07       ` Daniel Vetter
2023-04-04 18:02 ` Zeng, Oak
2023-04-04 18:08   ` Matthew Brost
2023-04-05  7:30     ` Christian König
2023-04-05  8:42       ` Daniel Vetter
2023-04-05 18:06       ` Zeng, Oak
2023-04-05 18:53         ` Matthew Brost
2023-04-06 10:04           ` Christian König
2023-04-07  0:20           ` Zeng, Oak
2023-04-11  9:02             ` Christian König
2023-04-11 14:13               ` Daniel Vetter
2023-04-17  6:47                 ` Christian König
2023-04-17  8:39                   ` Daniel Vetter
2023-04-18 15:10 ` Liviu Dudau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCwrgvAHGvdTCe7K@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=airlied@linux.ie \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lina@asahilina.net \
    --cc=robdclark@chromium.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox