All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Fri, 30 Dec 2022 12:55:08 +0100	[thread overview]
Message-ID: <20221230125508.57af8a14@collabora.com> (raw)
In-Reply-To: <20221230112042.2ddd1946@collabora.com>

On Fri, 30 Dec 2022 11:20:42 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> Hello Matthew,
> 
> On Thu, 22 Dec 2022 14:21:11 -0800
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> > mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
> > seems a bit odd but let us explain the reasoning below.
> > 
> > 1. In XE the submission order from multiple drm_sched_entity is not
> > guaranteed to be the same completion even if targeting the same hardware
> > engine. This is because in XE we have a firmware scheduler, the GuC,
> > which allowed to reorder, timeslice, and preempt submissions. If a using
> > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
> > apart as the TDR expects submission order == completion order. Using a
> > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.  
> 
> Oh, that's interesting. I've been trying to solve the same sort of
> issues to support Arm's new Mali GPU which is relying on a FW-assisted
> scheduling scheme (you give the FW N streams to execute, and it does
> the scheduling between those N command streams, the kernel driver
> does timeslice scheduling to update the command streams passed to the
> FW). I must admit I gave up on using drm_sched at some point, mostly
> because the integration with drm_sched was painful, but also because I
> felt trying to bend drm_sched to make it interact with a
> timeslice-oriented scheduling model wasn't really future proof. Giving
> drm_sched_entity exlusive access to a drm_gpu_scheduler probably might
> help for a few things (didn't think it through yet), but I feel it's
> coming short on other aspects we have to deal with on Arm GPUs.

Ok, so I just had a quick look at the Xe driver and how it
instantiates the drm_sched_entity and drm_gpu_scheduler, and I think I
have a better understanding of how you get away with using drm_sched
while still controlling how scheduling is really done. Here
drm_gpu_scheduler is just a dummy abstract that let's you use the
drm_sched job queuing/dep/tracking mechanism. The whole run-queue
selection is dumb because there's only one entity ever bound to the
scheduler (the one that's part of the xe_guc_engine object which also
contains the drm_gpu_scheduler instance). I guess the main issue we'd
have on Arm is the fact that the stream doesn't necessarily get
scheduled when ->run_job() is called, it can be placed in the runnable
queue and be picked later by the kernel-side scheduler when a FW slot
gets released. That can probably be sorted out by manually disabling the
job timer and re-enabling it when the stream gets picked by the
scheduler. But my main concern remains, we're basically abusing
drm_sched here.

For the Arm driver, that means turning the following sequence

1. wait for job deps
2. queue job to ringbuf and push the stream to the runnable
   queue (if it wasn't queued already). Wakeup the timeslice scheduler
   to re-evaluate (if the stream is not on a FW slot already)
3. stream gets picked by the timeslice scheduler and sent to the FW for
   execution

into

1. queue job to entity which takes care of waiting for job deps for
   us
2. schedule a drm_sched_main iteration
3. the only available entity is picked, and the first job from this
   entity is dequeued. ->run_job() is called: the job is queued to the
   ringbuf and the stream is pushed to the runnable queue (if it wasn't
   queued already). Wakeup the timeslice scheduler to re-evaluate (if
   the stream is not on a FW slot already)
4. stream gets picked by the timeslice scheduler and sent to the FW for
   execution

That's one extra step we don't really need. To sum-up, yes, all the
job/entity tracking might be interesting to share/re-use, but I wonder
if we couldn't have that without pulling out the scheduling part of
drm_sched, or maybe I'm missing something, and there's something in
drm_gpu_scheduler you really need.

WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Fri, 30 Dec 2022 12:55:08 +0100	[thread overview]
Message-ID: <20221230125508.57af8a14@collabora.com> (raw)
In-Reply-To: <20221230112042.2ddd1946@collabora.com>

On Fri, 30 Dec 2022 11:20:42 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> Hello Matthew,
> 
> On Thu, 22 Dec 2022 14:21:11 -0800
> Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> > mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
> > seems a bit odd but let us explain the reasoning below.
> > 
> > 1. In XE the submission order from multiple drm_sched_entity is not
> > guaranteed to be the same completion even if targeting the same hardware
> > engine. This is because in XE we have a firmware scheduler, the GuC,
> > which allowed to reorder, timeslice, and preempt submissions. If a using
> > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
> > apart as the TDR expects submission order == completion order. Using a
> > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.  
> 
> Oh, that's interesting. I've been trying to solve the same sort of
> issues to support Arm's new Mali GPU which is relying on a FW-assisted
> scheduling scheme (you give the FW N streams to execute, and it does
> the scheduling between those N command streams, the kernel driver
> does timeslice scheduling to update the command streams passed to the
> FW). I must admit I gave up on using drm_sched at some point, mostly
> because the integration with drm_sched was painful, but also because I
> felt trying to bend drm_sched to make it interact with a
> timeslice-oriented scheduling model wasn't really future proof. Giving
> drm_sched_entity exlusive access to a drm_gpu_scheduler probably might
> help for a few things (didn't think it through yet), but I feel it's
> coming short on other aspects we have to deal with on Arm GPUs.

Ok, so I just had a quick look at the Xe driver and how it
instantiates the drm_sched_entity and drm_gpu_scheduler, and I think I
have a better understanding of how you get away with using drm_sched
while still controlling how scheduling is really done. Here
drm_gpu_scheduler is just a dummy abstract that let's you use the
drm_sched job queuing/dep/tracking mechanism. The whole run-queue
selection is dumb because there's only one entity ever bound to the
scheduler (the one that's part of the xe_guc_engine object which also
contains the drm_gpu_scheduler instance). I guess the main issue we'd
have on Arm is the fact that the stream doesn't necessarily get
scheduled when ->run_job() is called, it can be placed in the runnable
queue and be picked later by the kernel-side scheduler when a FW slot
gets released. That can probably be sorted out by manually disabling the
job timer and re-enabling it when the stream gets picked by the
scheduler. But my main concern remains, we're basically abusing
drm_sched here.

For the Arm driver, that means turning the following sequence

1. wait for job deps
2. queue job to ringbuf and push the stream to the runnable
   queue (if it wasn't queued already). Wakeup the timeslice scheduler
   to re-evaluate (if the stream is not on a FW slot already)
3. stream gets picked by the timeslice scheduler and sent to the FW for
   execution

into

1. queue job to entity which takes care of waiting for job deps for
   us
2. schedule a drm_sched_main iteration
3. the only available entity is picked, and the first job from this
   entity is dequeued. ->run_job() is called: the job is queued to the
   ringbuf and the stream is pushed to the runnable queue (if it wasn't
   queued already). Wakeup the timeslice scheduler to re-evaluate (if
   the stream is not on a FW slot already)
4. stream gets picked by the timeslice scheduler and sent to the FW for
   execution

That's one extra step we don't really need. To sum-up, yes, all the
job/entity tracking might be interesting to share/re-use, but I wonder
if we couldn't have that without pulling out the scheduling part of
drm_sched, or maybe I'm missing something, and there's something in
drm_gpu_scheduler you really need.

  reply	other threads:[~2022-12-30 11:55 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-22 22:21 [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission Matthew Brost
2022-12-22 22:21 ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 01/20] drm/suballoc: Introduce a generic suballocation manager Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 02/20] drm/amd: Convert amdgpu to use suballocation helper Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 03/20] drm/radeon: Use the drm suballocation manager implementation Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-23 17:42   ` [Intel-gfx] " Rob Clark
2022-12-28 22:21     ` Matthew Brost
2022-12-30 10:20   ` Boris Brezillon
2022-12-30 10:20     ` Boris Brezillon
2022-12-30 11:55     ` Boris Brezillon [this message]
2022-12-30 11:55       ` Boris Brezillon
2023-01-02  7:30       ` [Intel-gfx] " Boris Brezillon
2023-01-02  7:30         ` Boris Brezillon
2023-01-03 13:02         ` [Intel-gfx] " Tvrtko Ursulin
2023-01-03 14:21           ` Boris Brezillon
2023-01-03 14:21             ` Boris Brezillon
2023-01-05 21:43           ` Matthew Brost
2023-01-05 21:43             ` Matthew Brost
2023-01-06 23:52             ` Matthew Brost
2023-01-09 13:46               ` Tvrtko Ursulin
2023-01-09 17:27                 ` Jason Ekstrand
2023-01-09 17:27                   ` Jason Ekstrand
2023-01-10 11:28                   ` Tvrtko Ursulin
2023-01-10 11:28                     ` Tvrtko Ursulin
2023-01-10 12:19                     ` Tvrtko Ursulin
2023-01-10 12:19                       ` Tvrtko Ursulin
2023-01-10 15:55                       ` Matthew Brost
2023-01-10 15:55                         ` Matthew Brost
2023-01-10 16:50                         ` Tvrtko Ursulin
2023-01-10 16:50                           ` Tvrtko Ursulin
2023-01-10 19:01                           ` Matthew Brost
2023-01-10 19:01                             ` Matthew Brost
2023-01-11  9:17                             ` Tvrtko Ursulin
2023-01-11  9:17                               ` Tvrtko Ursulin
2023-01-11 18:07                               ` Matthew Brost
2023-01-11 18:07                                 ` Matthew Brost
2023-01-11 18:52                                 ` John Harrison
2023-01-11 18:55                                   ` Matthew Brost
2023-01-11 18:55                                     ` Matthew Brost
2023-01-10 14:08                     ` Jason Ekstrand
2023-01-10 14:08                       ` Jason Ekstrand
2023-01-11  8:50                       ` Tvrtko Ursulin
2023-01-11  8:50                         ` Tvrtko Ursulin
2023-01-11 19:40                         ` Matthew Brost
2023-01-11 19:40                           ` Matthew Brost
2023-01-12 18:43                           ` Tvrtko Ursulin
2023-01-12 18:43                             ` Tvrtko Ursulin
2023-01-11 22:18                         ` Jason Ekstrand
2023-01-11 22:18                           ` Jason Ekstrand
2023-01-11 22:31                           ` Matthew Brost
2023-01-11 22:31                             ` Matthew Brost
2023-01-11 22:56                             ` Jason Ekstrand
2023-01-11 22:56                               ` Jason Ekstrand
2023-01-13  0:39                               ` John Harrison
2023-01-18  3:06                                 ` Matthew Brost
2023-01-18  3:06                                   ` Matthew Brost
2023-01-10 16:39                     ` Matthew Brost
2023-01-10 16:39                       ` Matthew Brost
2023-01-11  1:13                       ` Matthew Brost
2023-01-11  1:13                         ` Matthew Brost
2023-01-11  9:09                         ` Tvrtko Ursulin
2023-01-11  9:09                           ` Tvrtko Ursulin
2023-01-11 17:52                           ` Matthew Brost
2023-01-11 17:52                             ` Matthew Brost
2023-01-12 18:21                             ` Tvrtko Ursulin
2023-01-12 18:21                               ` Tvrtko Ursulin
2023-01-05 19:40         ` Matthew Brost
2023-01-05 19:40           ` Matthew Brost
2023-01-09 15:45           ` [Intel-gfx] " Jason Ekstrand
2023-01-09 15:45             ` Jason Ekstrand
2023-01-09 17:17             ` Boris Brezillon
2023-01-09 17:17               ` Boris Brezillon
2023-01-09 20:40               ` Daniel Vetter
2023-01-09 20:40                 ` Daniel Vetter
2023-01-10  8:46                 ` Boris Brezillon
2023-01-10  8:46                   ` Boris Brezillon
2023-01-11 21:47                   ` Daniel Vetter
2023-01-11 21:47                     ` Daniel Vetter
2023-01-12  9:10                     ` Boris Brezillon
2023-01-12  9:10                       ` Boris Brezillon
2023-01-12  9:32                       ` Daniel Vetter
2023-01-12  9:32                         ` Daniel Vetter
2023-01-12 10:11                         ` Boris Brezillon
2023-01-12 10:11                           ` Boris Brezillon
2023-01-12 10:25                           ` Boris Brezillon
2023-01-12 10:25                             ` Boris Brezillon
2023-01-12 10:42                             ` Daniel Vetter
2023-01-12 10:42                               ` Daniel Vetter
2023-01-12 12:08                               ` Boris Brezillon
2023-01-12 12:08                                 ` Boris Brezillon
2023-01-12 15:38                                 ` Daniel Vetter
2023-01-12 15:38                                   ` Daniel Vetter
2023-01-12 16:48                                   ` Boris Brezillon
2023-01-12 16:48                                     ` Boris Brezillon
2023-01-12 10:30                           ` Boris Brezillon
2023-01-12 10:30                             ` Boris Brezillon
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 05/20] drm/sched: Add generic scheduler message interface Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 06/20] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 07/20] drm/sched: Submit job before starting TDR Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 08/20] drm/sched: Add helper to set TDR timeout Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 09/20] drm: Add a gpu page-table walker helper Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 10/20] drm/ttm: Don't print error message if eviction was interrupted Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 11/20] drm/i915: Remove gem and overlay frontbuffer tracking Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-23 11:13   ` [Intel-gfx] " Tvrtko Ursulin
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 12/20] drm/i915/display: Neuter frontbuffer tracking harder Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 13/20] drm/i915/display: Add more macros to remove all direct calls to uncore Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 14/20] drm/i915/display: Remove all uncore mmio accesses in favor of intel_de Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 15/20] drm/i915: Rename find_section to find_bdb_section Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 16/20] drm/i915/regs: Set DISPLAY_MMIO_BASE to 0 for xe Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 17/20] drm/i915/display: Fix a use-after-free when intel_edp_init_connector fails Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 18/20] drm/i915/display: Remaining changes to make xe compile Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 19/20] sound/hda: Allow XE as i915 replacement for sound Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 20/20] mei/hdcp: Also enable for XE Matthew Brost
2022-12-22 22:21   ` Matthew Brost
2022-12-22 22:41 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Initial Xe driver submission Patchwork
2023-01-02  8:14 ` [Intel-gfx] [RFC PATCH 00/20] " Thomas Zimmermann
2023-01-02  8:14   ` Thomas Zimmermann
2023-01-02 11:42   ` [Intel-gfx] " Jani Nikula
2023-01-02 11:42     ` Jani Nikula
2023-01-03 13:56     ` [Intel-gfx] " Boris Brezillon
2023-01-03 13:56       ` Boris Brezillon
2023-01-03 14:41       ` [Intel-gfx] " Alyssa Rosenzweig
2023-01-03 14:41         ` Alyssa Rosenzweig
2023-01-03 12:21 ` [Intel-gfx] " Tvrtko Ursulin
2023-01-05 21:27   ` Matthew Brost
2023-01-12  9:54     ` Lucas De Marchi
2023-01-12  9:54       ` Lucas De Marchi
2023-01-12 17:10       ` Matthew Brost
2023-01-12 17:10         ` Matthew Brost
2023-01-17 16:40         ` Jason Ekstrand
2023-01-10 12:33 ` Boris Brezillon
2023-01-10 12:33   ` Boris Brezillon
2023-01-17 16:12 ` [Intel-gfx] " Jason Ekstrand
2023-02-17 20:51 ` Daniel Vetter
2023-02-17 20:51   ` Daniel Vetter
2023-02-27 12:46   ` [Intel-gfx] " Oded Gabbay
2023-02-27 12:46     ` Oded Gabbay
2023-03-01 23:00   ` [Intel-gfx] " Rodrigo Vivi
2023-03-01 23:00     ` Rodrigo Vivi
2023-03-09 15:10     ` Daniel Vetter
2023-03-09 15:10       ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221230125508.57af8a14@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.