Re: [Intel-xe] [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Danilo Krummrich <dakr@redhat.com>
Cc: robdclark@chromium.org, sarah.walker@imgtec.com,
	ketil.johnsen@arm.com, lina@asahilina.net, Liviu.Dudau@arm.com,
	dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	luben.tuikov@amd.com, donald.robson@imgtec.com,
	boris.brezillon@collabora.com,
	"Christian König" <christian.koenig@amd.com>,
	faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Date: Fri, 18 Aug 2023 03:08:52 +0000	[thread overview]
Message-ID: <ZN7gxBpnZD8OW9ZW@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <e8fa305a-0ac8-ece7-efeb-f9cec2892d44@redhat.com>

On Thu, Aug 17, 2023 at 01:13:31PM +0200, Danilo Krummrich wrote:
> On 8/17/23 07:33, Christian König wrote:
> > Am 16.08.23 um 18:33 schrieb Danilo Krummrich:
> > > On 8/16/23 16:59, Christian König wrote:
> > > > Am 16.08.23 um 14:30 schrieb Danilo Krummrich:
> > > > > On 8/16/23 16:05, Christian König wrote:
> > > > > > Am 16.08.23 um 13:30 schrieb Danilo Krummrich:
> > > > > > > Hi Matt,
> > > > > > > 
> > > > > > > On 8/11/23 04:31, Matthew Brost wrote:
> > > > > > > > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> > > > > > > > mapping between a drm_gpu_scheduler and
> > > > > > > > drm_sched_entity. At first this
> > > > > > > > seems a bit odd but let us explain the reasoning below.
> > > > > > > > 
> > > > > > > > 1. In XE the submission order from multiple drm_sched_entity is not
> > > > > > > > guaranteed to be the same completion even if
> > > > > > > > targeting the same hardware
> > > > > > > > engine. This is because in XE we have a firmware scheduler, the GuC,
> > > > > > > > which allowed to reorder, timeslice, and preempt
> > > > > > > > submissions. If a using
> > > > > > > > shared drm_gpu_scheduler across multiple
> > > > > > > > drm_sched_entity, the TDR falls
> > > > > > > > apart as the TDR expects submission order ==
> > > > > > > > completion order. Using a
> > > > > > > > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
> > > > > > > > 
> > > > > > > > 2. In XE submissions are done via programming a
> > > > > > > > ring buffer (circular
> > > > > > > > buffer), a drm_gpu_scheduler provides a limit on
> > > > > > > > number of jobs, if the
> > > > > > > > limit of number jobs is set to RING_SIZE /
> > > > > > > > MAX_SIZE_PER_JOB we get flow
> > > > > > > > control on the ring for free.
> > > > > > > 
> > > > > > > In XE, where does the limitation of MAX_SIZE_PER_JOB come from?
> > > > > > > 
> > > > > > > In Nouveau we currently do have such a limitation as
> > > > > > > well, but it is derived from the RING_SIZE, hence
> > > > > > > RING_SIZE / MAX_SIZE_PER_JOB would always be 1.
> > > > > > > However, I think most jobs won't actually utilize
> > > > > > > the whole ring.
> > > > > > 
> > > > > > Well that should probably rather be RING_SIZE /
> > > > > > MAX_SIZE_PER_JOB = hw_submission_limit (or even
> > > > > > hw_submission_limit - 1 when the hw can't distinct full
> > > > > > vs empty ring buffer).
> > > > > 
> > > > > Not sure if I get you right, let me try to clarify what I
> > > > > was trying to say: I wanted to say that in Nouveau
> > > > > MAX_SIZE_PER_JOB isn't really limited by anything other than
> > > > > the RING_SIZE and hence we'd never allow more than 1 active
> > > > > job.
> > > > 
> > > > But that lets the hw run dry between submissions. That is
> > > > usually a pretty horrible idea for performance.
> > > 
> > > Correct, that's the reason why I said it seems to be more efficient
> > > to base ring flow control on the actual size of each incoming job
> > > rather than the maximum size of a job.
> > > 
> > > > 
> > > > > 
> > > > > However, it seems to be more efficient to base ring flow
> > > > > control on the actual size of each incoming job rather than
> > > > > the worst case, namely the maximum size of a job.
> > > > 
> > > > That doesn't sounds like a good idea to me. See we don't limit
> > > > the number of submitted jobs based on the ring size, but rather
> > > > we calculate the ring size based on the number of submitted
> > > > jobs.
> > > > 
> > > 
> > > My point isn't really about whether we derive the ring size from the
> > > job limit or the other way around. It's more about the job size (or
> > > its maximum size) being arbitrary.
> > > 
> > > As mentioned in my reply to Matt:
> > > 
> > > "In Nouveau, userspace can submit an arbitrary amount of addresses
> > > of indirect bufferes containing the ring instructions. The ring on
> > > the kernel side takes the addresses of the indirect buffers rather
> > > than the instructions themself. Hence, technically there isn't
> > > really a limit on the amount of IBs submitted by a job except for
> > > the ring size."
> > > 
> > > So, my point is that I don't really want to limit the job size
> > > artificially just to be able to fit multiple jobs into the ring even
> > > if they're submitted at their "artificial" maximum size, but rather
> > > track how much of the ring the submitted job actually occupies.
> > > 
> > > > In other words the hw_submission_limit defines the ring size,
> > > > not the other way around. And you usually want the
> > > > hw_submission_limit as low as possible for good scheduler
> > > > granularity and to avoid extra overhead.
> > > 
> > > I don't think you really mean "as low as possible", do you?
> > 
> > No, I do mean as low as possible or in other words as few as possible.
> > 
> > Ideally the scheduler would submit only the minimum amount of work to
> > the hardware to keep the hardware busy. >
> > The hardware seems to work mostly the same for all vendors, but you
> > somehow seem to think that filling the ring is somehow beneficial which
> > is really not the case as far as I can see.
> 
> I think that's a misunderstanding. I'm not trying to say that it is *always*
> beneficial to fill up the ring as much as possible. But I think it is under
> certain circumstances, exactly those circumstances I described for Nouveau.
> 
> As mentioned, in Nouveau the size of a job is only really limited by the
> ring size, which means that one job can (but does not necessarily) fill up
> the whole ring. We both agree that this is inefficient, because it
> potentially results into the HW run dry due to hw_submission_limit == 1.
> 
> I recognize you said that one should define hw_submission_limit and adjust
> the other parts of the equation accordingly, the options I see are:
> 
> (1) Increase the ring size while keeping the maximum job size.
> (2) Decrease the maximum job size while keeping the ring size.
> (3) Let the scheduler track the actual job size rather than the maximum job
> size.
> 
> (1) results into potentially wasted ring memory, because we're not always
> reaching the maximum job size, but the scheduler assumes so.
> 
> (2) results into more IOCTLs from userspace for the same amount of IBs and
> more jobs result into more memory allocations and more work being submitted
> to the workqueue (with Matt's patches).
> 
> (3) doesn't seem to have any of those draw backs.
> 
> What would be your take on that?
> 
> Actually, if none of the other drivers is interested into a more precise way
> of keeping track of the ring utilization, I'd be totally fine to do it in a
> driver specific way. However, unfortunately I don't see how this would be
> possible.
> 
> My proposal would be to just keep the hw_submission_limit (maybe rename it
> to submission_unit_limit) and add a submission_units field to struct
> drm_sched_job. By default a jobs submission_units field would be 0 and the
> scheduler would behave the exact same way as it does now.
> 
> Accordingly, jobs with submission_units > 1 would contribute more than one
> unit to the submission_unit_limit.
> 
> What do you think about that?
> 

This seems reasonible to me and a very minimal change to the scheduler.

Matt 

> Besides all that, you said that filling up the ring just enough to not let
> the HW run dry rather than filling it up entirely is desirable. Why do you
> think so? I tend to think that in most cases it shouldn't make difference.
> 
> - Danilo
> 
> > 
> > Regards,
> > Christian.
> > 
> > > Because one really is the minimum if you want to do work at all, but
> > > as you mentioned above a job limit of one can let the ring run dry.
> > > 
> > > In the end my proposal comes down to tracking the actual size of a
> > > job rather than just assuming a pre-defined maximum job size, and
> > > hence a dynamic job limit.
> > > 
> > > I don't think this would hurt the scheduler granularity. In fact, it
> > > should even contribute to the desire of not letting the ring run dry
> > > even better. Especially for sequences of small jobs, where the
> > > current implementation might wrongly assume the ring is already full
> > > although actually there would still be enough space left.
> > > 
> > > > 
> > > > Christian.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Otherwise your scheduler might just overwrite the ring
> > > > > > buffer by pushing things to fast.
> > > > > > 
> > > > > > Christian.
> > > > > > 
> > > > > > > 
> > > > > > > Given that, it seems like it would be better to let
> > > > > > > the scheduler keep track of empty ring "slots"
> > > > > > > instead, such that the scheduler can deceide whether
> > > > > > > a subsequent job will still fit on the ring and if
> > > > > > > not re-evaluate once a previous job finished. Of
> > > > > > > course each submitted job would be required to carry
> > > > > > > the number of slots it requires on the ring.
> > > > > > > 
> > > > > > > What to you think of implementing this as
> > > > > > > alternative flow control mechanism? Implementation
> > > > > > > wise this could be a union with the existing
> > > > > > > hw_submission_limit.
> > > > > > > 
> > > > > > > - Danilo
> > > > > > > 
> > > > > > > > 
> > > > > > > > A problem with this design is currently a drm_gpu_scheduler uses a
> > > > > > > > kthread for submission / job cleanup. This doesn't scale if a large
> > > > > > > > number of drm_gpu_scheduler are used. To work
> > > > > > > > around the scaling issue,
> > > > > > > > use a worker rather than kthread for submission / job cleanup.
> > > > > > > > 
> > > > > > > > v2:
> > > > > > > >    - (Rob Clark) Fix msm build
> > > > > > > >    - Pass in run work queue
> > > > > > > > v3:
> > > > > > > >    - (Boris) don't have loop in worker
> > > > > > > > v4:
> > > > > > > >    - (Tvrtko) break out submit ready, stop,
> > > > > > > > start helpers into own patch
> > > > > > > > 
> > > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
>

next prev parent reply	other threads:[~2023-08-18  3:09 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11  2:31 [Intel-xe] [PATCH v2 0/9] DRM scheduler changes for Xe Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-08-16 11:30   ` Danilo Krummrich
2023-08-16 14:05     ` Christian König
2023-08-16 12:30       ` Danilo Krummrich
2023-08-16 14:38         ` Matthew Brost
2023-08-16 15:40           ` Danilo Krummrich
2023-08-16 14:59         ` Christian König
2023-08-16 16:33           ` Danilo Krummrich
2023-08-17  5:33             ` Christian König
2023-08-17 11:13               ` Danilo Krummrich
2023-08-17 13:35                 ` Christian König
2023-08-17 12:48                   ` Danilo Krummrich
2023-08-17 16:17                     ` Christian König
2023-08-18 11:58                       ` Danilo Krummrich
2023-08-21 14:07                         ` Christian König
2023-08-21 18:01                           ` Danilo Krummrich
2023-08-21 18:12                             ` Christian König
2023-08-21 19:07                               ` Danilo Krummrich
2023-08-22  9:35                                 ` Christian König
2023-08-21 19:46                               ` Faith Ekstrand
2023-08-22  9:51                                 ` Christian König
2023-08-22 16:55                                   ` Faith Ekstrand
2023-08-24 11:50                                     ` Bas Nieuwenhuizen
2023-08-18  3:08                 ` Matthew Brost [this message]
2023-08-18  5:40                   ` Christian König
2023-08-18 12:49                     ` Matthew Brost
2023-08-18 12:06                       ` Danilo Krummrich
2023-09-12 14:28                 ` Boris Brezillon
2023-09-12 14:33                   ` Danilo Krummrich
2023-09-12 14:49                     ` Boris Brezillon
2023-09-12 15:13                       ` Boris Brezillon
2023-09-12 16:58                         ` Danilo Krummrich
2023-09-12 16:52                       ` Danilo Krummrich
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 2/9] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-08-11 21:43   ` Maira Canal
2023-08-12  3:20     ` Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 3/9] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-08-29 17:37   ` Danilo Krummrich
2023-09-05 11:10     ` Danilo Krummrich
2023-09-11 19:44       ` Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 4/9] drm/sched: Split free_job into own work item Matthew Brost
2023-08-17 13:39   ` Christian König
2023-08-17 17:54     ` Matthew Brost
2023-08-18  5:27       ` Christian König
2023-08-18 13:13         ` Matthew Brost
2023-08-21 13:17           ` Christian König
2023-08-23  3:27             ` Matthew Brost
2023-08-23  7:10               ` Christian König
2023-08-23 15:24                 ` Matthew Brost
2023-08-23 15:41                   ` Alex Deucher
2023-08-23 17:26                     ` Rodrigo Vivi
2023-08-23 23:12                       ` Matthew Brost
2023-08-24 11:44                         ` Christian König
2023-08-24 14:30                           ` Matthew Brost
2023-08-24 23:04   ` Danilo Krummrich
2023-08-25  2:58     ` Matthew Brost
2023-08-25  8:02       ` Christian König
2023-08-25 13:36         ` Matthew Brost
2023-08-25 13:45           ` Christian König
2023-09-12 10:13             ` Boris Brezillon
2023-09-12 10:46               ` Danilo Krummrich
2023-09-12 12:18                 ` Boris Brezillon
2023-09-12 12:56                   ` Danilo Krummrich
2023-09-12 13:52                     ` Boris Brezillon
2023-09-12 14:10                       ` Danilo Krummrich
2023-09-12 13:27             ` Boris Brezillon
2023-09-12 13:34               ` Danilo Krummrich
2023-09-12 13:53                 ` Boris Brezillon
2023-08-28 18:04   ` Danilo Krummrich
2023-08-28 18:41     ` Matthew Brost
2023-08-29  1:20       ` Danilo Krummrich
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 5/9] drm/sched: Add generic scheduler message interface Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 6/9] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 7/9] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 8/9] drm/sched: Submit job before starting TDR Matthew Brost
2023-08-11  2:31 ` [Intel-xe] [PATCH v2 9/9] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-08-11  2:34 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev2) Patchwork
2023-08-24  0:08 ` [Intel-xe] [PATCH v2 0/9] DRM scheduler changes for Xe Danilo Krummrich
2023-08-24  3:23   ` Matthew Brost
2023-08-24 14:51     ` Danilo Krummrich
2023-08-25  3:01 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev3) Patchwork
2023-09-05 11:13 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev4) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZN7gxBpnZD8OW9ZW@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Liviu.Dudau@arm.com \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=donald.robson@imgtec.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=ketil.johnsen@arm.com \
    --cc=lina@asahilina.net \
    --cc=luben.tuikov@amd.com \
    --cc=robdclark@chromium.org \
    --cc=sarah.walker@imgtec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox