From: Nick Hoath <nicholas.hoath@intel.com>
To: "intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
Date: Wed, 14 Oct 2015 17:19:11 +0100 [thread overview]
Message-ID: <561E807F.7060105@intel.com> (raw)
In-Reply-To: <561E69D2.7050900@intel.com>
On 14/10/2015 15:42, Dave Gordon wrote:
> On 13/10/15 12:36, Chris Wilson wrote:
>> On Tue, Oct 13, 2015 at 01:29:56PM +0200, Daniel Vetter wrote:
>>> On Fri, Oct 09, 2015 at 06:23:50PM +0100, Chris Wilson wrote:
>>>> On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
>>>>> On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
>>>>>> On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
>>>>>>> My idea was to create a new request for 3. which gets signalled by the
>>>>>>> scheduler in intel_lrc_irq_handler. My idea was that we'd only create
>>>>>>> these when a ctx switch might occur to avoid overhead, but I guess if we
>>>>>>> just outright delay all requests a notch if need that might work too. But
>>>>>>> I'm really not sure on the implications of that (i.e. does the hardware
>>>>>>> really unlod the ctx if it's idle?), and whether that would fly still with
>>>>>>> the scheduler.
>>>>>>>
>>>>>>> But figuring this one out here seems to be the cornestone of this reorg.
>>>>>>> Without it we can't just throw contexts onto the active list.
>>>>>>
>>>>>> (Let me see if I understand it correctly)
>>>>>>
>>>>>> Basically the problem is that we can't trust the context object to be
>>>>>> synchronized until after the status interrupt. The way we handled that
>>>>>> for legacy is to track the currently bound context and keep the
>>>>>> vma->pin_count asserted until the request containing the switch away.
>>>>>> Doing the same for execlists would trivially fix the issue and if done
>>>>>> smartly allows us to share more code (been there, done that).
>>>>>>
>>>>>> That satisfies me for keeping requests as a basic fence in the GPU
>>>>>> timeline and should keep everyone happy that the context can't vanish
>>>>>> until after it is complete. The only caveat is that we cannot evict the
>>>>>> most recent context. For legacy, we do a switch back to the always
>>>>>> pinned default context. For execlists we don't, but it still means we
>>>>>> should only have one context which cannot be evicted (like legacy). But
>>>>>> it does leave us with the issue that i915_gpu_idle() returns early and
>>>>>> i915_gem_context_fini() must keep the explicit gpu reset to be
>>>>>> absolutely sure that the pending context writes are completed before the
>>>>>> final context is unbound.
>>>>>
>>>>> Yes, and that was what I originally had in mind. Meanwhile the scheduler
>>>>> (will) happen and that means we won't have FIFO ordering. Which means when
>>>>> we switch contexts (as opposed to just adding more to the ringbuffer of
>>>>> the current one) we won't have any idea which context will be the next
>>>>> one. Which also means we don't know which request to pick to retire the
>>>>> old context. Hence why I think we need to be better.
>>>>
>>>> But the scheduler does - it is also in charge of making sure the
>>>> retirement queue is in order. The essence is that we only actually pin
>>>> engine->last_context, which is chosen as we submit stuff to the hw.
>>>
>>> Well I'm not sure how much it will reorder, but I'd expect it wants to
>>> reorder stuff pretty freely. And as soon as it reorders context (ofc they
>>> can't depend on each another) then the legacy hw ctx tracking won't work.
>>>
>>> I think at least ...
>>
>> Not the way it is written today, but the principle behind it still
>> stands. The last_context submitted to the hardware is pinned until a new
>> one is submitted (such that it remains bound in the GGTT until after the
>> context switch is complete due to the active reference). Instead of
>> doing the context tracking at the start of the execbuffer, the context
>> tracking needs to be pushed down to the submission backend/middleman.
>> -Chris
>
> Does anyone actually know what guarantees (if any) the GPU provides
> w.r.t access to context images vs. USER_INTERRUPTs and CSB-updated
> interrupts? Does 'active->idle' really mean that the context has been
> fully updated in memory (and can therefore be unmapped), or just that
> the engine has stopped processing (but the context might not be saved
> until it's known that it isn't going to be reactivated).
>
> For example, it could implement this:
>
> (End of last batch in current context)
> 1. Update seqno
> 2. Generate USER_INTERRUPT
> 3. Engine finishes work
> (HEAD == TAIL and no further contexts queued in ELSP)
> 4. Save all per-context registers to context image
> 5. Flush to memory and invalidate
> 6. Update CSB
> 7. Flush to memory
> 8. Generate CSB-update interrupt.
>
> (New batch in same context submitted via ELSP)
> 9. Reload entire context image from memory
> 10. Update CSB
> 11. Generate CSB-update interrupt
>
> Or this:
> 1. Update seqno
> 2. Generate USER_INTERRUPT
> 3. Engine finishes work
> (HEAD == TAIL and no further contexts queued in ELSP)
> 4. Update CSB
> 5. Generate CSB-update interrupt.
>
> (New batch in DIFFERENT context submitted via ELSP)
> 6. Save all per-context registers to old context image
> 7. Load entire context image from new image
> 8. Update CSB
> 9. Generate CSB-update interrupt
>
> The former is synchronous and relatively easy to model, the latter is
> more like the way legacy mode works. Any various other permutations are
> possible (sync save vs async save vs deferred save, full reload vs
> lite-restore, etc). So I think we either need to know what really
> happens (and assume future chips will work the same way), or make only
> minimal assumptions and code something that will work no matter how the
> hardware actually behaves. That probably precludes any attempt at
> tracking individual context-switches at the CSB level, which in any case
> aren't passed to the CPU in GuC submission mode.
>
> .Dave.
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Tracking context via last_context at request retirement.
In LRC/ELSP mode:
At startup:
- Double refcount default context
- Set last_context to default context
When a request is complete
- If last_context == current_context
- queue request for cleanup
- If last_context != current_context
- unref last_context
- update last_context to current_context
- queue request for cleanup
What this achieves:
Make the code path closer to legacy submission
Can now use active_list tracking for contexts & ringbufs
Additional work 1:
- When there is no work pending on an engine, at some point:
- Send a nop request on the default context
This moves last_context to be default context,
allowing previous last_context to be unref'd
Additional work 2:
- Change legacy mode to use last_context post request completion
This will allow us to unify the code paths.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-10-14 16:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
2015-10-07 16:03 ` Daniel Vetter
2015-10-07 16:05 ` Chris Wilson
2015-10-08 12:32 ` Chris Wilson
2015-10-09 7:58 ` Daniel Vetter
2015-10-09 8:36 ` Chris Wilson
2015-10-09 9:15 ` Daniel Vetter
2015-10-09 9:45 ` Chris Wilson
2015-10-09 17:18 ` Daniel Vetter
2015-10-09 17:23 ` Chris Wilson
2015-10-13 11:29 ` Daniel Vetter
2015-10-13 11:36 ` Chris Wilson
2015-10-14 14:42 ` Dave Gordon
2015-10-14 16:19 ` Nick Hoath [this message]
2015-10-06 14:52 ` [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects Nick Hoath
2015-10-07 16:05 ` Daniel Vetter
2015-10-08 13:35 ` Chris Wilson
2015-10-16 14:42 ` Nick Hoath
2015-10-19 9:48 ` Daniel Vetter
2015-10-19 10:54 ` Nick Hoath
2015-10-06 14:52 ` [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items Nick Hoath
2015-10-07 16:08 ` Daniel Vetter
2015-10-06 14:52 ` [PATCH 4/4] drm/i915: Only update ringbuf address when necessary Nick Hoath
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561E807F.7060105@intel.com \
--to=nicholas.hoath@intel.com \
--cc=daniel@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.