Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 36/56] drm/i915: Fair low-latency scheduling
Date: Thu, 7 Jan 2021 09:10:35 -0800	[thread overview]
Message-ID: <20210107171035.GA9023@sdutt-i7> (raw)
In-Reply-To: <161003793126.28368.13767958122371874524@build.alporthouse.com>

On Thu, Jan 07, 2021 at 04:45:31PM +0000, Chris Wilson wrote:
> Quoting Matthew Brost (2021-01-07 16:05:07)
> > On Tue, Dec 29, 2020 at 12:01:25PM +0000, Chris Wilson wrote:
> > > The first "scheduler" was a topographical sorting of requests into
> > > priority order. The execution order was deterministic, the earliest
> > > submitted, highest priority request would be executed first. Priority
> > > inheritance ensured that inversions were kept at bay, and allowed us to
> > > dynamically boost priorities (e.g. for interactive pageflips).
> > > 
> > > The minimalistic timeslicing scheme was an attempt to introduce fairness
> > > between long running requests, by evicting the active request at the end
> > > of a timeslice and moving it to the back of its priority queue (while
> > > ensuring that dependencies were kept in order). For short running
> > > requests from many clients of equal priority, the scheme is still very
> > > much FIFO submission ordering, and as unfair as before.
> > > 
> > > To impose fairness, we need an external metric that ensures that clients
> > > are interpersed, so we don't execute one long chain from client A before
> > > executing any of client B. This could be imposed by the clients
> > > themselves by using fences based on an external clock, that is they only
> > > submit work for a "frame" at frame-intervals, instead of submitting as
> > > much work as they are able to. The standard SwapBuffers approach is akin
> > > to double bufferring, where as one frame is being executed, the next is
> > > being submitted, such that there is always a maximum of two frames per
> > > client in the pipeline and so ideally maintains consistent input-output
> > > latency. Even this scheme exhibits unfairness under load as a single
> > > client will execute two frames back to back before the next, and with
> > > enough clients, deadlines will be missed.
> > > 
> > > The idea introduced by BFS/MuQSS is that fairness is introduced by
> > > metering with an external clock. Every request, when it becomes ready to
> > > execute is assigned a virtual deadline, and execution order is then
> > > determined by earliest deadline. Priority is used as a hint, rather than
> > > strict ordering, where high priority requests have earlier deadlines,
> > > but not necessarily earlier than outstanding work. Thus work is executed
> > > in order of 'readiness', with timeslicing to demote long running work.
> > > 
> > > The Achille's heel of this scheduler is its strong preference for
> > > low-latency and favouring of new queues. Whereas it was easy to dominate
> > > the old scheduler by flooding it with many requests over a short period
> > > of time, the new scheduler can be dominated by a 'synchronous' client
> > > that waits for each of its requests to complete before submitting the
> > > next. As such a client has no history, it is always considered
> > > ready-to-run and receives an earlier deadline than the long running
> > > requests. This is compensated for by refreshing the current execution's
> > > deadline and by disallowing preemption for timeslice shuffling.
> > > 
> > > To check the impact on throughput (often the downfall of latency
> > > sensitive schedulers), we used gem_wsim to simulate various transcode
> > > workloads with different load balancers, and varying the number of
> > > competing [heterogenous] clients.
> > > 
> > > +delta%------------------------------------------------------------------+
> > > |                                a                                       |
> > > |                                a                                       |
> > > |                                aa                                      |
> > > |                                aa                                      |
> > > |                                aa                                      |
> > > |                                aa                                      |
> > > |                               aaa                                      |
> > > |                              aaaa                                      |
> > > |                           a  aaaaa                                     |
> > > |                           a aaaaaa                                     |
> > > |a              aa   a      aaaaaaaaaa aa               a               a|
> > > |                                A_|                                     |
> > > +------------------------------------------------------------------------+
> > >    N          Min           Max        Median           Avg        Stddev
> > >  108   -23.982194     28.421527  -0.077474828  -0.072650418    0.16179718
> > > 
> > > The impact was on average 0.1% under contention due to the change in
> > > context execution order and number of context switches. The biggest
> > > swings are due to the execution ordering favouring one client or another,
> > > and maybe room for improvement.
> > > 
> > 
> > I haven't dug into this series deeply but it does seem plausible for
> > execlist submission. However this new scheduler seems completely broken
> > for Guc submission.
> 
> Underneath it's the same topological sort, just with a different key.
> At worst, that key can be the plain priority again. But that's pretty
> much immaterial in terms of deciding the order which requests are
> transferred from one queue to another. Execution order wants to be based
> on resource prioritisation (beyond request priorisation), ideally we
> would keep our fine grained scheduling control.
> 

So maybe just hook for the key?

> I'm not convinced by the state of the fw's scheduler...
>

Can't really argue here, but I think it would be really helpful if you
raised your concerns with the fw team so that can be addressed.
 
> > The scheduler really isn't used that much with GuC
> > submission but when it is the old one works perfectly. The vfunc for
> > the scheduler is deleted in patch #25. I don't think that is good idea
> > as it appears we are trending to 2 separate schedulers.
> 
> You still use the i915_request_set_priority() entry point as the
> caller into the backend bypass.
>

I thought we want to avoid backend specific branches, hence vfuncs.

> Isn't the current guc.fw a second closed driver for the same HW anyway?
>

Not sure I follow this.
 
> > Lastly this series seems to be on the wrong list, I believe it should be
> > posted elsewhere first.
> 
> It's the right place for upstreaming bug fixes on stable platforms. And
> after a year of being here, that's a moot point.

Wow, I feel your pain if this has been out here for a year. But still it
is wrong place. I can help and will help with the reviews if it is in
the correct spot.

Matt

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-01-07 17:15 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-29 12:00 [Intel-gfx] [PATCH 01/56] drm/i915/gt: Restore ce->signal flush before releasing virtual engine Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 02/56] drm/i915/gt: Only retire on the last breadcrumb if the last request Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 03/56] drm/i915/gt: Cancel submitted requests upon context reset Chris Wilson
2020-12-30 21:07   ` Mika Kuoppala
2020-12-29 12:00 ` [Intel-gfx] [PATCH 04/56] drm/i915/gt: Pull context closure check from request submit to schedule-in Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 05/56] drm/i915/gem: Peek at the inflight context Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 06/56] drm/i915: Mark up protected uses of 'i915_request_completed' Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 07/56] drm/i915: Drop i915_request.lock serialisation around await_start Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 08/56] drm/i915: Drop i915_request.lock requirement for intel_rps_boost() Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 09/56] drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source Chris Wilson
2020-12-29 12:00 ` [Intel-gfx] [PATCH 10/56] drm/i915/gem: Reduce ctx->engines_mutex for get_engines() Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 11/56] drm/i915: Reduce test_and_set_bit to set_bit in i915_request_submit() Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 12/56] drm/i915/gt: Drop atomic for engine->fw_active tracking Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 13/56] drm/i915/gt: Extract busy-stats for ring-scheduler Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 14/56] drm/i915/gt: Convert stats.active to plain unsigned int Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 15/56] drm/i915/gt: Do not suspend bonded requests if one hangs Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 16/56] drm/i915/gt: Remove timeslice suppression Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 17/56] drm/i915/gt: Skip over completed active execlists, again Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 18/56] drm/i915: Strip out internal priorities Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 19/56] drm/i915: Remove I915_USER_PRIORITY_SHIFT Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 20/56] drm/i915: Replace engine->schedule() with a known request operation Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 21/56] drm/i915: Teach the i915_dependency to use a double-lock Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 22/56] drm/i915: Restructure priority inheritance Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 23/56] drm/i915/selftests: Measure set-priority duration Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 24/56] drm/i915/selftests: Exercise priority inheritance around an engine loop Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 25/56] drm/i915: Improve DFS for priority inheritance Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 26/56] drm/i915: Extract request submission from execlists Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 27/56] drm/i915: Extract request rewinding " Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 28/56] drm/i915: Extract request suspension from the execlists backend Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 29/56] drm/i915: Extract the ability to defer and rerun a request later Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 30/56] drm/i915: Fix the iterative dfs for defering requests Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 31/56] drm/i915: Move common active lists from engine to i915_scheduler Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 32/56] drm/i915: Move scheduler queue Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 33/56] drm/i915: Move tasklet from execlists to sched Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 34/56] drm/i915: Replace priolist rbtree with a skiplist Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 35/56] drm/i915: Wrap cmpxchg64 with try_cmpxchg64() helper Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 36/56] drm/i915: Fair low-latency scheduling Chris Wilson
2021-01-07 16:05   ` Matthew Brost
2021-01-07 16:45     ` Chris Wilson
2021-01-07 17:10       ` Matthew Brost [this message]
2020-12-29 12:01 ` [Intel-gfx] [PATCH 37/56] drm/i915/gt: Specify a deadline for the heartbeat Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 38/56] drm/i915: Extend the priority boosting for the display with a deadline Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 39/56] drm/i915/gt: Support virtual engine queues Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 40/56] drm/i915: Move saturated workload detection back to the context Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 41/56] drm/i915: Bump default timeslicing quantum to 5ms Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 42/56] drm/i915/gt: Wrap intel_timeline.has_initial_breadcrumb Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 43/56] drm/i915/gt: Track timeline GGTT offset separately from subpage offset Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 44/56] drm/i915/gt: Add timeline "mode" Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 45/56] drm/i915/gt: Use indices for writing into relative timelines Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 46/56] drm/i915/selftests: Exercise relative timeline modes Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 47/56] drm/i915/gt: Use ppHWSP for unshared non-semaphore related timelines Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 48/56] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq" Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 49/56] drm/i915/gt: Couple tasklet scheduling for all CS interrupts Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 50/56] drm/i915/gt: Support creation of 'internal' rings Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 51/56] drm/i915/gt: Use client timeline address for seqno writes Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 52/56] drm/i915/gt: Infrastructure for ring scheduling Chris Wilson
2020-12-29 18:34   ` kernel test robot
2020-12-29 12:01 ` [Intel-gfx] [PATCH 53/56] drm/i915/gt: Enable busy-stats for ring-scheduler Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 54/56] drm/i915/gt: Implement ring scheduler for gen6/7 Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 55/56] drm/i915/gt: Enable ring scheduling " Chris Wilson
2020-12-29 12:01 ` [Intel-gfx] [PATCH 56/56] drm/i915/gt: Limit C-states while waiting for requests Chris Wilson
2020-12-29 12:20 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/56] drm/i915/gt: Restore ce->signal flush before releasing virtual engine Patchwork
2020-12-29 12:22 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-12-29 12:49 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210107171035.GA9023@sdutt-i7 \
    --to=matthew.brost@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox