All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Jeff McGee <jeff.mcgee@intel.com>
Cc: kalyan.kondapally@intel.com, intel-gfx@lists.freedesktop.org,
	ben@bwidawsk.net
Subject: Re: [RFC 0/8] Force preemption
Date: Thu, 22 Mar 2018 17:41:57 +0000	[thread overview]
Message-ID: <cd26f404-4f64-4a30-7aa9-e4d547308157@linux.intel.com> (raw)
In-Reply-To: <20180322160137.GN19343@jeffdesk>


On 22/03/2018 16:01, Jeff McGee wrote:
> On Thu, Mar 22, 2018 at 03:57:49PM +0000, Tvrtko Ursulin wrote:
>>
>> On 22/03/2018 14:34, Jeff McGee wrote:
>>> On Thu, Mar 22, 2018 at 09:28:00AM +0000, Chris Wilson wrote:
>>>> Quoting Tvrtko Ursulin (2018-03-22 09:22:55)
>>>>>
>>>>> On 21/03/2018 17:26, jeff.mcgee@intel.com wrote:
>>>>>> From: Jeff McGee <jeff.mcgee@intel.com>
>>>>>>
>>>>>> Force preemption uses engine reset to enforce a limit on the time
>>>>>> that a request targeted for preemption can block. This feature is
>>>>>> a requirement in automotive systems where the GPU may be shared by
>>>>>> clients of critically high priority and clients of low priority that
>>>>>> may not have been curated to be preemption friendly. There may be
>>>>>> more general applications of this feature. I'm sharing as an RFC to
>>>>>> stimulate that discussion and also to get any technical feedback
>>>>>> that I can before submitting to the product kernel that needs this.
>>>>>> I have developed the patches for ease of rebase, given that this is
>>>>>> for the moment considered a non-upstreamable feature. It would be
>>>>>> possible to refactor hangcheck to fully incorporate force preemption
>>>>>> as another tier of patience (or impatience) with the running request.
>>>>>
>>>>> Sorry if it was mentioned elsewhere and I missed it - but does this work
>>>>> only with stateless clients - or in other words, what would happen to
>>>>> stateful clients which would be force preempted? Or the answer is we
>>>>> don't care since they are misbehaving?
>>>>
>>>> They get notified of being guilty for causing a gpu reset; three strikes
>>>> and they are out (banned from using the gpu) using the current rules.
>>>> This is a very blunt hammer that requires the rest of the system to be
>>>> robust; one might argue time spent making the system robust would be
>>>> better served making sure that the timer never expired in the first place
>>>> thereby eliminating the need for a forced gpu reset.
>>>> -Chris
>>>
>>> Yes, for simplication the policy applied to force preempted contexts
>>> is the same as for hanging contexts. It is known that this feature
>>> should not be required in a fully curated system. It's a requirement
>>> if end user will be alllowed to install 3rd party apps to run in the
>>> non-critical domain.
>>
>> My concern is whether it safe to call this force _preemption_, while
>> it is not really expected to work as preemption from the point of
>> view of preempted context. I may be missing some angle here, but I
>> think a better name would include words like maximum request
>> duration or something.
>>
>> I can see a difference between allowed maximum duration when there
>> is something else pending, and when it isn't, but I don't
>> immediately see that we should consider this distinction for any
>> real benefit?
>>
>> So should the feature just be "maximum request duration"? This would
>> perhaps make it just a special case of hangcheck, which ignores head
>> progress, or whatever we do in there.
>>
>> Regards,
>>
>> Tvrtko
> 
> I think you might be unclear about how this works. We're not starting a
> preemption to see if we can cleanly remove a request who has begun to
> exceed its normal time slice, i.e. hangcheck. This is about bounding
> the time that a normal preemption can take. So first start preemption
> in response to higher-priority request arrival, then wait for preemption
> to complete within a certain amount of time. If it does not, resort to
> reset.
> 
> So it's really "force the resolution of a preemption", shortened to
> "force preemption".

You are right, I veered off in my thinking and ended up with something 
different. :)

I however still think the name is potentially misleading, since the 
request/context is not getting preempted. It is getting effectively 
killed (sooner or later, directly or indirectly).

Maybe that is OK for the specific use case when everything is only 
broken and not malicious.

In a more general purpose system it would be a bit random when something 
would work, and when it wouldn't, depending on system setup and even 
timings.

Hm, maybe you don't even really benefit from the standard three strikes 
and you are out policy, and for this specific use case, you should just 
kill it straight away. If it couldn't be preempted once, why pay the 
penalty any more?

If you don't have it already, devising a solution which blacklists the 
process (if it creates more contexts), or even a parent (if forking is 
applicable and implementation feasible), for offenders could also be 
beneficial.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-03-22 18:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 17:26 [RFC 0/8] Force preemption jeff.mcgee
2018-03-21 17:26 ` [RFC 1/8] drm/i915/execlists: Refactor out complete_preempt_context() jeff.mcgee
2018-03-21 17:26 ` [RFC 2/8] drm/i915: Add control flags to i915_handle_error() jeff.mcgee
2018-03-21 17:26 ` [RFC 3/8] drm/i915: Move engine reset prepare/finish to backends jeff.mcgee
2018-03-21 17:26 ` [RFC 4/8] drm/i915: Split execlists/guc reset prepartions jeff.mcgee
2018-03-21 17:26 ` [RFC 5/8] drm/i915/execlists: Flush pending preemption events during reset jeff.mcgee
2018-03-21 17:26 ` [RFC 6/8] drm/i915: Fix loop on CSB processing jeff.mcgee
2018-03-21 17:33   ` Jeff McGee
2018-03-21 18:06     ` Chris Wilson
2018-03-21 18:29       ` Jeff McGee
2018-03-21 19:04         ` Chris Wilson
2018-03-21 17:26 ` [RFC 7/8] drm/i915: Skip CSB processing on invalid CSB tail jeff.mcgee
2018-03-21 17:31   ` Jeff McGee
2018-03-21 18:12     ` Chris Wilson
2018-03-21 19:06       ` Chris Wilson
2018-03-21 17:26 ` [RFC 8/8] drm/i915: Force preemption to complete via engine reset jeff.mcgee
2018-03-21 18:50 ` ✗ Fi.CI.BAT: failure for Force preemption (rev2) Patchwork
2018-03-22  9:22 ` [RFC 0/8] Force preemption Tvrtko Ursulin
2018-03-22  9:28   ` Chris Wilson
2018-03-22 14:34     ` Jeff McGee
2018-03-22 15:35       ` Chris Wilson
2018-03-22 15:44         ` Jeff McGee
2018-03-22 15:57       ` Tvrtko Ursulin
2018-03-22 16:01         ` Jeff McGee
2018-03-22 17:41           ` Tvrtko Ursulin [this message]
2018-03-22 19:08             ` Jeff McGee
2018-03-22 19:59               ` Bloomfield, Jon
2018-03-23 13:20                 ` Joonas Lahtinen
2018-03-23 13:37                   ` Chris Wilson
  -- strict thread matches above, loose matches on Subject: below --
2018-03-16 18:30 jeff.mcgee
2018-03-16 20:53 ` Chris Wilson
2018-03-16 21:03   ` Chris Wilson
2018-03-16 22:34 ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd26f404-4f64-4a30-7aa9-e4d547308157@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=ben@bwidawsk.net \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jeff.mcgee@intel.com \
    --cc=kalyan.kondapally@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.