Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Wed, 11 Jan 2023 18:07:04 +0000	[thread overview]
Message-ID: <Y776yIC+iJDlchjo@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <703310df-21c8-57ac-8b27-4ae342265df1@linux.intel.com>

On Wed, Jan 11, 2023 at 09:17:01AM +0000, Tvrtko Ursulin wrote:
> 
> On 10/01/2023 19:01, Matthew Brost wrote:
> > On Tue, Jan 10, 2023 at 04:50:55PM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 10/01/2023 15:55, Matthew Brost wrote:
> > > > On Tue, Jan 10, 2023 at 12:19:35PM +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 10/01/2023 11:28, Tvrtko Ursulin wrote:
> > > > > > 
> > > > > > 
> > > > > > On 09/01/2023 17:27, Jason Ekstrand wrote:
> > > > > > 
> > > > > > [snip]
> > > > > > 
> > > > > > >        >>> AFAICT it proposes to have 1:1 between *userspace* created
> > > > > > >       contexts (per
> > > > > > >        >>> context _and_ engine) and drm_sched. I am not sure avoiding
> > > > > > >       invasive changes
> > > > > > >        >>> to the shared code is in the spirit of the overall idea and
> > > > > > > instead
> > > > > > >        >>> opportunity should be used to look at way to refactor/improve
> > > > > > >       drm_sched.
> > > > > > > 
> > > > > > > 
> > > > > > > Maybe?  I'm not convinced that what Xe is doing is an abuse at all
> > > > > > > or really needs to drive a re-factor.  (More on that later.)
> > > > > > > There's only one real issue which is that it fires off potentially a
> > > > > > > lot of kthreads. Even that's not that bad given that kthreads are
> > > > > > > pretty light and you're not likely to have more kthreads than
> > > > > > > userspace threads which are much heavier.  Not ideal, but not the
> > > > > > > end of the world either.  Definitely something we can/should
> > > > > > > optimize but if we went through with Xe without this patch, it would
> > > > > > > probably be mostly ok.
> > > > > > > 
> > > > > > >        >> Yes, it is 1:1 *userspace* engines and drm_sched.
> > > > > > >        >>
> > > > > > >        >> I'm not really prepared to make large changes to DRM scheduler
> > > > > > >       at the
> > > > > > >        >> moment for Xe as they are not really required nor does Boris
> > > > > > >       seem they
> > > > > > >        >> will be required for his work either. I am interested to see
> > > > > > >       what Boris
> > > > > > >        >> comes up with.
> > > > > > >        >>
> > > > > > >        >>> Even on the low level, the idea to replace drm_sched threads
> > > > > > >       with workers
> > > > > > >        >>> has a few problems.
> > > > > > >        >>>
> > > > > > >        >>> To start with, the pattern of:
> > > > > > >        >>>
> > > > > > >        >>>    while (not_stopped) {
> > > > > > >        >>>     keep picking jobs
> > > > > > >        >>>    }
> > > > > > >        >>>
> > > > > > >        >>> Feels fundamentally in disagreement with workers (while
> > > > > > >       obviously fits
> > > > > > >        >>> perfectly with the current kthread design).
> > > > > > >        >>
> > > > > > >        >> The while loop breaks and worker exists if no jobs are ready.
> > > > > > > 
> > > > > > > 
> > > > > > > I'm not very familiar with workqueues. What are you saying would fit
> > > > > > > better? One scheduling job per work item rather than one big work
> > > > > > > item which handles all available jobs?
> > > > > > 
> > > > > > Yes and no, it indeed IMO does not fit to have a work item which is
> > > > > > potentially unbound in runtime. But it is a bit moot conceptual mismatch
> > > > > > because it is a worst case / theoretical, and I think due more
> > > > > > fundamental concerns.
> > > > > > 
> > > > > > If we have to go back to the low level side of things, I've picked this
> > > > > > random spot to consolidate what I have already mentioned and perhaps
> > > > > > expand.
> > > > > > 
> > > > > > To start with, let me pull out some thoughts from workqueue.rst:
> > > > > > 
> > > > > > """
> > > > > > Generally, work items are not expected to hog a CPU and consume many
> > > > > > cycles. That means maintaining just enough concurrency to prevent work
> > > > > > processing from stalling should be optimal.
> > > > > > """
> > > > > > 
> > > > > > For unbound queues:
> > > > > > """
> > > > > > The responsibility of regulating concurrency level is on the users.
> > > > > > """
> > > > > > 
> > > > > > Given the unbound queues will be spawned on demand to service all queued
> > > > > > work items (more interesting when mixing up with the system_unbound_wq),
> > > > > > in the proposed design the number of instantiated worker threads does
> > > > > > not correspond to the number of user threads (as you have elsewhere
> > > > > > stated), but pessimistically to the number of active user contexts. That
> > > > > > is the number which drives the maximum number of not-runnable jobs that
> > > > > > can become runnable at once, and hence spawn that many work items, and
> > > > > > in turn unbound worker threads.
> > > > > > 
> > > > > > Several problems there.
> > > > > > 
> > > > > > It is fundamentally pointless to have potentially that many more threads
> > > > > > than the number of CPU cores - it simply creates a scheduling storm.
> > > > > 
> > > > > To make matters worse, if I follow the code correctly, all these per user
> > > > > context worker thread / work items end up contending on the same lock or
> > > > > circular buffer, both are one instance per GPU:
> > > > > 
> > > > > guc_engine_run_job
> > > > >    -> submit_engine
> > > > >       a) wq_item_append
> > > > >           -> wq_wait_for_space
> > > > >             -> msleep
> > > > 
> > > > a) is dedicated per xe_engine
> > > 
> > > Hah true, what its for then? I thought throttling the LRCA ring is done via:
> > > 
> > 
> > This is a per guc_id 'work queue' which is used for parallel submission
> > (e.g. multiple LRC tail values need to written atomically by the GuC).
> > Again in practice there should always be space.
> 
> Speaking of guc id, where does blocking when none are available happen in
> the non parallel case?
> 

We have 64k guc_ids on native, 1k guc_ids with 64k VFs. Either way we
think that is more than enough and can just reject xe_engine creation if
we run out of guc_ids. If this proves to false, we can fix this but the
guc_id stealing the i915 is rather complicated and hopefully not needed.

We will limit the number of guc_ids allowed per user pid to reasonible
number to prevent a DoS. Elevated pids (e.g. IGTs) will be able do to
whatever they want.

> > >    drm_sched_init(&ge->sched, &drm_sched_ops,
> > > 		 e->lrc[0].ring.size / MAX_JOB_SIZE_BYTES,
> > > 
> > > Is there something more to throttle other than the ring? It is throttling
> > > something using msleeps..
> > > 
> > > > Also you missed the step of programming the ring which is dedicated per xe_engine
> > > 
> > > I was trying to quickly find places which serialize on something in the
> > > backend, ringbuffer emission did not seem to do that but maybe I missed
> > > something.
> > > 
> > 
> > xe_ring_ops vfunc emit_job is called to write the ring.
> 
> Right but does it serialize between different contexts, I didn't spot that
> it does in which case it wasn't relevant to the sub story.
>

Right just saying this is an additional step that is done in parallel
between xe_engines.
 
> > > > 
> > > > >       b) xe_guc_ct_send
> > > > >           -> guc_ct_send
> > > > >             -> mutex_lock(&ct->lock);
> > > > >             -> later a potential msleep in h2g_has_room
> > > > 
> > > > Techincally there is 1 instance per GT not GPU, yes this is shared but
> > > > in practice there will always be space in the CT channel so contention
> > > > on the lock should be rare.
> > > 
> > > Yeah I used the term GPU to be more understandable to outside audience.
> > > 
> > > I am somewhat disappointed that the Xe opportunity hasn't been used to
> > > improve upon the CT communication bottlenecks. I mean those backoff sleeps
> > > and lock contention. I wish there would be a single thread in charge of the
> > > CT channel and internal users (other parts of the driver) would be able to
> > > send their requests to it in a more efficient manner, with less lock
> > > contention and centralized backoff.
> > > 
> > 
> > Well the CT backend was more or less a complete rewrite. Mutexes
> > actually work rather well to ensure fairness compared to the spin locks
> > used in the i915. This code was pretty heavily reviewed by Daniel and
> > both of us landed a big mutex for all of the CT code compared to the 3
> > or 4 spin locks used in the i915.
> 
> Are the "nb" sends gone? But that aside, I wasn't meaning just the locking
> but the high level approach. Never  mind.
>

xe_guc_ct_send is non-blocking, xe_guc_ct_send_block is blocking. I
don't think the later is used yet.
 
> > > > I haven't read your rather long reply yet, but also FWIW using a
> > > > workqueue has suggested by AMD (original authors of the DRM scheduler)
> > > > when we ran this design by them.
> > > 
> > > Commit message says nothing about that. ;)
> > > 
> > 
> > Yea I missed that, will fix in the next rev. Just dug through my emails
> > and Christian suggested a work queue and Andrey also gave some input on
> > the DRM scheduler design.
> > 
> > Also in the next will likely update the run_wq to be passed in by the
> > user.
> 
> Yes, and IMO that may need to be non-optional.
>

Yea, will fix.

Matt
 
> Regards,
> 
> Tvrtko

next prev parent reply	other threads:[~2023-01-11 18:07 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-22 22:21 [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 01/20] drm/suballoc: Introduce a generic suballocation manager Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 02/20] drm/amd: Convert amdgpu to use suballocation helper Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 03/20] drm/radeon: Use the drm suballocation manager implementation Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2022-12-23 17:42   ` Rob Clark
2022-12-28 22:21     ` Matthew Brost
2022-12-30 10:20   ` Boris Brezillon
2022-12-30 11:55     ` Boris Brezillon
2023-01-02  7:30       ` Boris Brezillon
2023-01-03 13:02         ` Tvrtko Ursulin
2023-01-03 14:21           ` Boris Brezillon
2023-01-05 21:43           ` Matthew Brost
2023-01-06 23:52             ` Matthew Brost
2023-01-09 13:46               ` Tvrtko Ursulin
2023-01-09 17:27                 ` Jason Ekstrand
2023-01-10 11:28                   ` Tvrtko Ursulin
2023-01-10 12:19                     ` Tvrtko Ursulin
2023-01-10 15:55                       ` Matthew Brost
2023-01-10 16:50                         ` Tvrtko Ursulin
2023-01-10 19:01                           ` Matthew Brost
2023-01-11  9:17                             ` Tvrtko Ursulin
2023-01-11 18:07                               ` Matthew Brost [this message]
2023-01-11 18:52                                 ` John Harrison
2023-01-11 18:55                                   ` Matthew Brost
2023-01-10 14:08                     ` Jason Ekstrand
2023-01-11  8:50                       ` Tvrtko Ursulin
2023-01-11 19:40                         ` Matthew Brost
2023-01-12 18:43                           ` Tvrtko Ursulin
2023-01-11 22:18                         ` Jason Ekstrand
2023-01-11 22:31                           ` Matthew Brost
2023-01-11 22:56                             ` Jason Ekstrand
2023-01-13  0:39                               ` John Harrison
2023-01-18  3:06                                 ` Matthew Brost
2023-01-10 16:39                     ` Matthew Brost
2023-01-11  1:13                       ` Matthew Brost
2023-01-11  9:09                         ` Tvrtko Ursulin
2023-01-11 17:52                           ` Matthew Brost
2023-01-12 18:21                             ` Tvrtko Ursulin
2023-01-05 19:40         ` Matthew Brost
2023-01-09 15:45           ` Jason Ekstrand
2023-01-09 17:17             ` Boris Brezillon
2023-01-09 20:40               ` Daniel Vetter
2023-01-10  8:46                 ` Boris Brezillon
2023-01-11 21:47                   ` Daniel Vetter
2023-01-12  9:10                     ` Boris Brezillon
2023-01-12  9:32                       ` Daniel Vetter
2023-01-12 10:11                         ` Boris Brezillon
2023-01-12 10:25                           ` Boris Brezillon
2023-01-12 10:42                             ` Daniel Vetter
2023-01-12 12:08                               ` Boris Brezillon
2023-01-12 15:38                                 ` Daniel Vetter
2023-01-12 16:48                                   ` Boris Brezillon
2023-01-12 10:30                           ` Boris Brezillon
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 05/20] drm/sched: Add generic scheduler message interface Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 06/20] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 07/20] drm/sched: Submit job before starting TDR Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 08/20] drm/sched: Add helper to set TDR timeout Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 09/20] drm: Add a gpu page-table walker helper Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 10/20] drm/ttm: Don't print error message if eviction was interrupted Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 11/20] drm/i915: Remove gem and overlay frontbuffer tracking Matthew Brost
2022-12-23 11:13   ` Tvrtko Ursulin
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 12/20] drm/i915/display: Neuter frontbuffer tracking harder Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 13/20] drm/i915/display: Add more macros to remove all direct calls to uncore Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 14/20] drm/i915/display: Remove all uncore mmio accesses in favor of intel_de Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 15/20] drm/i915: Rename find_section to find_bdb_section Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 16/20] drm/i915/regs: Set DISPLAY_MMIO_BASE to 0 for xe Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 17/20] drm/i915/display: Fix a use-after-free when intel_edp_init_connector fails Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 18/20] drm/i915/display: Remaining changes to make xe compile Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 19/20] sound/hda: Allow XE as i915 replacement for sound Matthew Brost
2022-12-22 22:21 ` [Intel-gfx] [RFC PATCH 20/20] mei/hdcp: Also enable for XE Matthew Brost
2022-12-22 22:41 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Initial Xe driver submission Patchwork
2023-01-02  8:14 ` [Intel-gfx] [RFC PATCH 00/20] " Thomas Zimmermann
2023-01-02 11:42   ` Jani Nikula
2023-01-03 13:56     ` Boris Brezillon
2023-01-03 14:41       ` Alyssa Rosenzweig
2023-01-03 12:21 ` Tvrtko Ursulin
2023-01-05 21:27   ` Matthew Brost
2023-01-12  9:54     ` Lucas De Marchi
2023-01-12 17:10       ` Matthew Brost
2023-01-17 16:40         ` Jason Ekstrand
2023-01-10 12:33 ` Boris Brezillon
2023-01-17 16:12 ` Jason Ekstrand
2023-02-17 20:51 ` Daniel Vetter
2023-02-27 12:46   ` Oded Gabbay
2023-03-01 23:00   ` Rodrigo Vivi
2023-03-09 15:10     ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y776yIC+iJDlchjo@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox