Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Tue, 10 Jan 2023 09:46:47 +0100	[thread overview]
Message-ID: <20230110094647.5897dbdd@collabora.com> (raw)
In-Reply-To: <Y7x7tSsdgQvZ+JD0@phenom.ffwll.local>

Hi Daniel,

On Mon, 9 Jan 2023 21:40:21 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, Jan 09, 2023 at 06:17:48PM +0100, Boris Brezillon wrote:
> > Hi Jason,
> > 
> > On Mon, 9 Jan 2023 09:45:09 -0600
> > Jason Ekstrand <jason@jlekstrand.net> wrote:
> >   
> > > On Thu, Jan 5, 2023 at 1:40 PM Matthew Brost <matthew.brost@intel.com>
> > > wrote:
> > >   
> > > > On Mon, Jan 02, 2023 at 08:30:19AM +0100, Boris Brezillon wrote:    
> > > > > On Fri, 30 Dec 2022 12:55:08 +0100
> > > > > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > > > >    
> > > > > > On Fri, 30 Dec 2022 11:20:42 +0100
> > > > > > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > > > > >    
> > > > > > > Hello Matthew,
> > > > > > >
> > > > > > > On Thu, 22 Dec 2022 14:21:11 -0800
> > > > > > > Matthew Brost <matthew.brost@intel.com> wrote:
> > > > > > >    
> > > > > > > > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> > > > > > > > mapping between a drm_gpu_scheduler and drm_sched_entity. At first    
> > > > this    
> > > > > > > > seems a bit odd but let us explain the reasoning below.
> > > > > > > >
> > > > > > > > 1. In XE the submission order from multiple drm_sched_entity is not
> > > > > > > > guaranteed to be the same completion even if targeting the same    
> > > > hardware    
> > > > > > > > engine. This is because in XE we have a firmware scheduler, the    
> > > > GuC,    
> > > > > > > > which allowed to reorder, timeslice, and preempt submissions. If a    
> > > > using    
> > > > > > > > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR    
> > > > falls    
> > > > > > > > apart as the TDR expects submission order == completion order.    
> > > > Using a    
> > > > > > > > dedicated drm_gpu_scheduler per drm_sched_entity solve this    
> > > > problem.    
> > > > > > >
> > > > > > > Oh, that's interesting. I've been trying to solve the same sort of
> > > > > > > issues to support Arm's new Mali GPU which is relying on a    
> > > > FW-assisted    
> > > > > > > scheduling scheme (you give the FW N streams to execute, and it does
> > > > > > > the scheduling between those N command streams, the kernel driver
> > > > > > > does timeslice scheduling to update the command streams passed to the
> > > > > > > FW). I must admit I gave up on using drm_sched at some point, mostly
> > > > > > > because the integration with drm_sched was painful, but also because    
> > > > I    
> > > > > > > felt trying to bend drm_sched to make it interact with a
> > > > > > > timeslice-oriented scheduling model wasn't really future proof.    
> > > > Giving    
> > > > > > > drm_sched_entity exlusive access to a drm_gpu_scheduler probably    
> > > > might    
> > > > > > > help for a few things (didn't think it through yet), but I feel it's
> > > > > > > coming short on other aspects we have to deal with on Arm GPUs.    
> > > > > >
> > > > > > Ok, so I just had a quick look at the Xe driver and how it
> > > > > > instantiates the drm_sched_entity and drm_gpu_scheduler, and I think I
> > > > > > have a better understanding of how you get away with using drm_sched
> > > > > > while still controlling how scheduling is really done. Here
> > > > > > drm_gpu_scheduler is just a dummy abstract that let's you use the
> > > > > > drm_sched job queuing/dep/tracking mechanism. The whole run-queue    
> > > >
> > > > You nailed it here, we use the DRM scheduler for queuing jobs,
> > > > dependency tracking and releasing jobs to be scheduled when dependencies
> > > > are met, and lastly a tracking mechanism of inflights jobs that need to
> > > > be cleaned up if an error occurs. It doesn't actually do any scheduling
> > > > aside from the most basic level of not overflowing the submission ring
> > > > buffer. In this sense, a 1 to 1 relationship between entity and
> > > > scheduler fits quite well.
> > > >    
> > > 
> > > Yeah, I think there's an annoying difference between what AMD/NVIDIA/Intel
> > > want here and what you need for Arm thanks to the number of FW queues
> > > available. I don't remember the exact number of GuC queues but it's at
> > > least 1k. This puts it in an entirely different class from what you have on
> > > Mali. Roughly, there's about three categories here:
> > > 
> > >  1. Hardware where the kernel is placing jobs on actual HW rings. This is
> > > old Mali, Intel Haswell and earlier, and probably a bunch of others.
> > > (Intel BDW+ with execlists is a weird case that doesn't fit in this
> > > categorization.)
> > > 
> > >  2. Hardware (or firmware) with a very limited number of queues where
> > > you're going to have to juggle in the kernel in order to run desktop Linux.
> > > 
> > >  3. Firmware scheduling with a high queue count. In this case, you don't
> > > want the kernel scheduling anything. Just throw it at the firmware and let
> > > it go brrrrr.  If we ever run out of queues (unlikely), the kernel can
> > > temporarily pause some low-priority contexts and do some juggling or,
> > > frankly, just fail userspace queue creation and tell the user to close some
> > > windows.
> > > 
> > > The existence of this 2nd class is a bit annoying but it's where we are. I
> > > think it's worth recognizing that Xe and panfrost are in different places
> > > here and will require different designs. For Xe, we really are just using
> > > drm/scheduler as a front-end and the firmware does all the real scheduling.
> > > 
> > > How do we deal with class 2? That's an interesting question.  We may
> > > eventually want to break that off into a separate discussion and not litter
> > > the Xe thread but let's keep going here for a bit.  I think there are some
> > > pretty reasonable solutions but they're going to look a bit different.
> > > 
> > > The way I did this for Xe with execlists was to keep the 1:1:1 mapping
> > > between drm_gpu_scheduler, drm_sched_entity, and userspace xe_engine.
> > > Instead of feeding a GuC ring, though, it would feed a fixed-size execlist
> > > ring and then there was a tiny kernel which operated entirely in IRQ
> > > handlers which juggled those execlists by smashing HW registers.  For
> > > Panfrost, I think we want something slightly different but can borrow some
> > > ideas here.  In particular, have the schedulers feed kernel-side SW queues
> > > (they can even be fixed-size if that helps) and then have a kthread which
> > > juggles those feeds the limited FW queues.  In the case where you have few
> > > enough active contexts to fit them all in FW, I do think it's best to have
> > > them all active in FW and let it schedule. But with only 31, you need to be
> > > able to juggle if you run out.  
> > 
> > That's more or less what I do right now, except I don't use the
> > drm_sched front-end to handle deps or queue jobs (at least not yet). The
> > kernel-side timeslice-based scheduler juggling with runnable queues
> > (queues with pending jobs that are not yet resident on a FW slot)
> > uses a dedicated ordered-workqueue instead of a thread, with scheduler
> > ticks being handled with a delayed-work (tick happening every X
> > milliseconds when queues are waiting for a slot). It all seems very
> > HW/FW-specific though, and I think it's a bit premature to try to
> > generalize that part, but the dep-tracking logic implemented by
> > drm_sched looked like something I could easily re-use, hence my
> > interest in Xe's approach.  
> 
> So another option for these few fw queue slots schedulers would be to
> treat them as vram and enlist ttm.
> 
> Well maybe more enlist ttm and less treat them like vram, but ttm can
> handle idr (or xarray or whatever you want) and then help you with all the
> pipelining (and the drm_sched then with sorting out dependencies). If you
> then also preferentially "evict" low-priority queus you pretty much have
> the perfect thing.
> 
> Note that GuC with sriov splits up the id space and together with some
> restrictions due to multi-engine contexts media needs might also need this
> all.
> 
> If you're balking at the idea of enlisting ttm just for fw queue
> management, amdgpu has a shoddy version of id allocation for their vm/tlb
> index allocation. Might be worth it to instead lift that into some sched
> helper code.

Would you mind pointing me to the amdgpu code you're mentioning here?
Still have a hard time seeing what TTM has to do with scheduling, but I
also don't know much about TTM, so I'll keep digging.

> 
> Either way there's two imo rather solid approaches available to sort this
> out. And once you have that, then there shouldn't be any big difference in
> driver design between fw with defacto unlimited queue ids, and those with
> severe restrictions in number of queues.

Honestly, I don't think there's much difference between those two cases
already. There's just a bunch of additional code to schedule queues on
FW slots for the limited-number-of-FW-slots case, which, right now, is
driver specific. The job queuing front-end pretty much achieves what
drm_sched does already: queuing job to entities, checking deps,
submitting job to HW (in our case, writing to the command stream ring
buffer). Things start to differ after that point: once a scheduling
entity has pending jobs, we add it to one of the runnable queues (one
queue per prio) and kick the kernel-side timeslice-based scheduler to
re-evaluate, if needed.

I'm all for using generic code when it makes sense, even if that means
adding this common code when it doesn't exists, but I don't want to be
dragged into some major refactoring that might take years to land.
Especially if pancsf is the first
FW-assisted-scheduler-with-few-FW-slot driver.

Here's a link to my WIP branch [1], and here is the scheduler logic
[2] if you want to have a look. Don't pay too much attention to the
driver uAPI (it's being redesigned).

Regards,

Boris

[1]https://gitlab.freedesktop.org/bbrezillon/linux/-/tree/pancsf
[2]https://gitlab.freedesktop.org/bbrezillon/linux/-/blob/pancsf/drivers/gpu/drm/pancsf/pancsf_sched.c

next prev parent reply	other threads:[~2023-01-10  8:47 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-22 22:21 [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 01/20] drm/suballoc: Introduce a generic suballocation manager Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 02/20] drm/amd: Convert amdgpu to use suballocation helper Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 03/20] drm/radeon: Use the drm suballocation manager implementation Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2022-12-23 17:42   ` Rob Clark
2022-12-28 22:21     ` Matthew Brost
2022-12-30 10:20   ` Boris Brezillon
2022-12-30 11:55     ` Boris Brezillon
2023-01-02  7:30       ` Boris Brezillon
2023-01-03 13:02         ` Tvrtko Ursulin
2023-01-03 14:21           ` Boris Brezillon
2023-01-05 21:43           ` Matthew Brost
2023-01-06 23:52             ` Matthew Brost
2023-01-09 13:46               ` Tvrtko Ursulin
2023-01-09 17:27                 ` Jason Ekstrand
2023-01-10 11:28                   ` Tvrtko Ursulin
2023-01-10 12:19                     ` Tvrtko Ursulin
2023-01-10 15:55                       ` Matthew Brost
2023-01-10 16:50                         ` Tvrtko Ursulin
2023-01-10 19:01                           ` Matthew Brost
2023-01-11  9:17                             ` Tvrtko Ursulin
2023-01-11 18:07                               ` Matthew Brost
2023-01-11 18:52                                 ` John Harrison
2023-01-11 18:55                                   ` Matthew Brost
2023-01-10 14:08                     ` Jason Ekstrand
2023-01-11  8:50                       ` Tvrtko Ursulin
2023-01-11 19:40                         ` Matthew Brost
2023-01-12 18:43                           ` Tvrtko Ursulin
2023-01-11 22:18                         ` Jason Ekstrand
2023-01-11 22:31                           ` Matthew Brost
2023-01-11 22:56                             ` Jason Ekstrand
2023-01-13  0:39                               ` John Harrison
2023-01-18  3:06                                 ` Matthew Brost
2023-01-10 16:39                     ` Matthew Brost
2023-01-11  1:13                       ` Matthew Brost
2023-01-11  9:09                         ` Tvrtko Ursulin
2023-01-11 17:52                           ` Matthew Brost
2023-01-12 18:21                             ` Tvrtko Ursulin
2023-01-05 19:40         ` Matthew Brost
2023-01-09 15:45           ` Jason Ekstrand
2023-01-09 17:17             ` Boris Brezillon
2023-01-09 20:40               ` Daniel Vetter
2023-01-10  8:46                 ` Boris Brezillon [this message]
2023-01-11 21:47                   ` Daniel Vetter
2023-01-12  9:10                     ` Boris Brezillon
2023-01-12  9:32                       ` Daniel Vetter
2023-01-12 10:11                         ` Boris Brezillon
2023-01-12 10:25                           ` Boris Brezillon
2023-01-12 10:42                             ` Daniel Vetter
2023-01-12 12:08                               ` Boris Brezillon
2023-01-12 15:38                                 ` Daniel Vetter
2023-01-12 16:48                                   ` Boris Brezillon
2023-01-12 10:30                           ` Boris Brezillon
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 05/20] drm/sched: Add generic scheduler message interface Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 06/20] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 07/20] drm/sched: Submit job before starting TDR Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 08/20] drm/sched: Add helper to set TDR timeout Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 09/20] drm: Add a gpu page-table walker helper Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 10/20] drm/ttm: Don't print error message if eviction was interrupted Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 11/20] drm/i915: Remove gem and overlay frontbuffer tracking Matthew Brost
2022-12-23 11:13   ` Tvrtko Ursulin
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 12/20] drm/i915/display: Neuter frontbuffer tracking harder Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 13/20] drm/i915/display: Add more macros to remove all direct calls to uncore Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 14/20] drm/i915/display: Remove all uncore mmio accesses in favor of intel_de Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 15/20] drm/i915: Rename find_section to find_bdb_section Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 16/20] drm/i915/regs: Set DISPLAY_MMIO_BASE to 0 for xe Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 17/20] drm/i915/display: Fix a use-after-free when intel_edp_init_connector fails Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 18/20] drm/i915/display: Remaining changes to make xe compile Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 19/20] sound/hda: Allow XE as i915 replacement for sound Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 20/20] mei/hdcp: Also enable for XE Matthew Brost
2022-12-22 22:41 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Initial Xe driver submission Patchwork
2023-01-02  8:14 ` [Intel-gfx] [RFC PATCH 00/20] " Thomas Zimmermann
2023-01-02 11:42   ` Jani Nikula
2023-01-03 13:56     ` Boris Brezillon
2023-01-03 14:41       ` Alyssa Rosenzweig
2023-01-03 12:21 ` Tvrtko Ursulin
2023-01-05 21:27   ` Matthew Brost
2023-01-12  9:54     ` Lucas De Marchi
2023-01-12 17:10       ` Matthew Brost
2023-01-17 16:40         ` Jason Ekstrand
2023-01-10 12:33 ` Boris Brezillon
2023-01-17 16:12 ` Jason Ekstrand
2023-02-17 20:51 ` Daniel Vetter
2023-02-27 12:46   ` Oded Gabbay
2023-03-01 23:00   ` Rodrigo Vivi
2023-03-09 15:10     ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230110094647.5897dbdd@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox