From: Jeff McGee <jeff.mcgee@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: kalyan.kondapally@intel.com, intel-gfx@lists.freedesktop.org,
ben@bwidawsk.net
Subject: Re: [RFC 0/8] Force preemption
Date: Thu, 22 Mar 2018 12:08:40 -0700 [thread overview]
Message-ID: <20180322190840.GA2879@jeffdesk> (raw)
In-Reply-To: <cd26f404-4f64-4a30-7aa9-e4d547308157@linux.intel.com>
On Thu, Mar 22, 2018 at 05:41:57PM +0000, Tvrtko Ursulin wrote:
>
> On 22/03/2018 16:01, Jeff McGee wrote:
> >On Thu, Mar 22, 2018 at 03:57:49PM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 22/03/2018 14:34, Jeff McGee wrote:
> >>>On Thu, Mar 22, 2018 at 09:28:00AM +0000, Chris Wilson wrote:
> >>>>Quoting Tvrtko Ursulin (2018-03-22 09:22:55)
> >>>>>
> >>>>>On 21/03/2018 17:26, jeff.mcgee@intel.com wrote:
> >>>>>>From: Jeff McGee <jeff.mcgee@intel.com>
> >>>>>>
> >>>>>>Force preemption uses engine reset to enforce a limit on the time
> >>>>>>that a request targeted for preemption can block. This feature is
> >>>>>>a requirement in automotive systems where the GPU may be shared by
> >>>>>>clients of critically high priority and clients of low priority that
> >>>>>>may not have been curated to be preemption friendly. There may be
> >>>>>>more general applications of this feature. I'm sharing as an RFC to
> >>>>>>stimulate that discussion and also to get any technical feedback
> >>>>>>that I can before submitting to the product kernel that needs this.
> >>>>>>I have developed the patches for ease of rebase, given that this is
> >>>>>>for the moment considered a non-upstreamable feature. It would be
> >>>>>>possible to refactor hangcheck to fully incorporate force preemption
> >>>>>>as another tier of patience (or impatience) with the running request.
> >>>>>
> >>>>>Sorry if it was mentioned elsewhere and I missed it - but does this work
> >>>>>only with stateless clients - or in other words, what would happen to
> >>>>>stateful clients which would be force preempted? Or the answer is we
> >>>>>don't care since they are misbehaving?
> >>>>
> >>>>They get notified of being guilty for causing a gpu reset; three strikes
> >>>>and they are out (banned from using the gpu) using the current rules.
> >>>>This is a very blunt hammer that requires the rest of the system to be
> >>>>robust; one might argue time spent making the system robust would be
> >>>>better served making sure that the timer never expired in the first place
> >>>>thereby eliminating the need for a forced gpu reset.
> >>>>-Chris
> >>>
> >>>Yes, for simplication the policy applied to force preempted contexts
> >>>is the same as for hanging contexts. It is known that this feature
> >>>should not be required in a fully curated system. It's a requirement
> >>>if end user will be alllowed to install 3rd party apps to run in the
> >>>non-critical domain.
> >>
> >>My concern is whether it safe to call this force _preemption_, while
> >>it is not really expected to work as preemption from the point of
> >>view of preempted context. I may be missing some angle here, but I
> >>think a better name would include words like maximum request
> >>duration or something.
> >>
> >>I can see a difference between allowed maximum duration when there
> >>is something else pending, and when it isn't, but I don't
> >>immediately see that we should consider this distinction for any
> >>real benefit?
> >>
> >>So should the feature just be "maximum request duration"? This would
> >>perhaps make it just a special case of hangcheck, which ignores head
> >>progress, or whatever we do in there.
> >>
> >>Regards,
> >>
> >>Tvrtko
> >
> >I think you might be unclear about how this works. We're not starting a
> >preemption to see if we can cleanly remove a request who has begun to
> >exceed its normal time slice, i.e. hangcheck. This is about bounding
> >the time that a normal preemption can take. So first start preemption
> >in response to higher-priority request arrival, then wait for preemption
> >to complete within a certain amount of time. If it does not, resort to
> >reset.
> >
> >So it's really "force the resolution of a preemption", shortened to
> >"force preemption".
>
> You are right, I veered off in my thinking and ended up with
> something different. :)
>
> I however still think the name is potentially misleading, since the
> request/context is not getting preempted. It is getting effectively
> killed (sooner or later, directly or indirectly).
>
> Maybe that is OK for the specific use case when everything is only
> broken and not malicious.
>
> In a more general purpose system it would be a bit random when
> something would work, and when it wouldn't, depending on system
> setup and even timings.
>
> Hm, maybe you don't even really benefit from the standard three
> strikes and you are out policy, and for this specific use case, you
> should just kill it straight away. If it couldn't be preempted once,
> why pay the penalty any more?
>
> If you don't have it already, devising a solution which blacklists
> the process (if it creates more contexts), or even a parent (if
> forking is applicable and implementation feasible), for offenders
> could also be beneficial.
>
> Regards,
>
> Tvrtko
Fair enough. There wasn't a lot of deliberation on this name. We
referred to it in various ways during development. I think I started
using "force preemption" because it was short. "reset to preempt" was
another phrase that was used.
The handling of the guilty client/context could be tailored more. Like
I said it was easiest to start with the same sort of handling that we
have already for hang scenarios. Simple is good when you are rebasing
without much hope to upstream. :(
If there was interest in upstreaming this capability, we could certainly
incorporate nicely within a refactoring of hangcheck. And then we
wouldn't even need a special name for it. The whole thing could be recast
as time slice management, where your slice is condition-based. You get
unlimited time if no one wants the engine, X time if someone of equal
priority wants the engine, and Y time if someone of higher priority
wants the engine, etc. Where 'Y' is analogous to the fpreempt_timeout
value in my RFC.
Jeff
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2018-03-22 19:23 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-21 17:26 [RFC 0/8] Force preemption jeff.mcgee
2018-03-21 17:26 ` [RFC 1/8] drm/i915/execlists: Refactor out complete_preempt_context() jeff.mcgee
2018-03-21 17:26 ` [RFC 2/8] drm/i915: Add control flags to i915_handle_error() jeff.mcgee
2018-03-21 17:26 ` [RFC 3/8] drm/i915: Move engine reset prepare/finish to backends jeff.mcgee
2018-03-21 17:26 ` [RFC 4/8] drm/i915: Split execlists/guc reset prepartions jeff.mcgee
2018-03-21 17:26 ` [RFC 5/8] drm/i915/execlists: Flush pending preemption events during reset jeff.mcgee
2018-03-21 17:26 ` [RFC 6/8] drm/i915: Fix loop on CSB processing jeff.mcgee
2018-03-21 17:33 ` Jeff McGee
2018-03-21 18:06 ` Chris Wilson
2018-03-21 18:29 ` Jeff McGee
2018-03-21 19:04 ` Chris Wilson
2018-03-21 17:26 ` [RFC 7/8] drm/i915: Skip CSB processing on invalid CSB tail jeff.mcgee
2018-03-21 17:31 ` Jeff McGee
2018-03-21 18:12 ` Chris Wilson
2018-03-21 19:06 ` Chris Wilson
2018-03-21 17:26 ` [RFC 8/8] drm/i915: Force preemption to complete via engine reset jeff.mcgee
2018-03-21 18:50 ` ✗ Fi.CI.BAT: failure for Force preemption (rev2) Patchwork
2018-03-22 9:22 ` [RFC 0/8] Force preemption Tvrtko Ursulin
2018-03-22 9:28 ` Chris Wilson
2018-03-22 14:34 ` Jeff McGee
2018-03-22 15:35 ` Chris Wilson
2018-03-22 15:44 ` Jeff McGee
2018-03-22 15:57 ` Tvrtko Ursulin
2018-03-22 16:01 ` Jeff McGee
2018-03-22 17:41 ` Tvrtko Ursulin
2018-03-22 19:08 ` Jeff McGee [this message]
2018-03-22 19:59 ` Bloomfield, Jon
2018-03-23 13:20 ` Joonas Lahtinen
2018-03-23 13:37 ` Chris Wilson
-- strict thread matches above, loose matches on Subject: below --
2018-03-16 18:30 jeff.mcgee
2018-03-16 20:53 ` Chris Wilson
2018-03-16 21:03 ` Chris Wilson
2018-03-16 22:34 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180322190840.GA2879@jeffdesk \
--to=jeff.mcgee@intel.com \
--cc=ben@bwidawsk.net \
--cc=intel-gfx@lists.freedesktop.org \
--cc=kalyan.kondapally@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.