From: Nick Hoath <nicholas.hoath@intel.com>
To: "intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
Date: Wed, 14 Oct 2015 17:19:11 +0100 [thread overview]
Message-ID: <561E807F.7060105@intel.com> (raw)
In-Reply-To: <561E69D2.7050900@intel.com>
On 14/10/2015 15:42, Dave Gordon wrote:
> On 13/10/15 12:36, Chris Wilson wrote:
>> On Tue, Oct 13, 2015 at 01:29:56PM +0200, Daniel Vetter wrote:
>>> On Fri, Oct 09, 2015 at 06:23:50PM +0100, Chris Wilson wrote:
>>>> On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
>>>>> On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
>>>>>> On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
>>>>>>> My idea was to create a new request for 3. which gets signalled by the
>>>>>>> scheduler in intel_lrc_irq_handler. My idea was that we'd only create
>>>>>>> these when a ctx switch might occur to avoid overhead, but I guess if we
>>>>>>> just outright delay all requests a notch if need that might work too. But
>>>>>>> I'm really not sure on the implications of that (i.e. does the hardware
>>>>>>> really unlod the ctx if it's idle?), and whether that would fly still with
>>>>>>> the scheduler.
>>>>>>>
>>>>>>> But figuring this one out here seems to be the cornestone of this reorg.
>>>>>>> Without it we can't just throw contexts onto the active list.
>>>>>>
>>>>>> (Let me see if I understand it correctly)
>>>>>>
>>>>>> Basically the problem is that we can't trust the context object to be
>>>>>> synchronized until after the status interrupt. The way we handled that
>>>>>> for legacy is to track the currently bound context and keep the
>>>>>> vma->pin_count asserted until the request containing the switch away.
>>>>>> Doing the same for execlists would trivially fix the issue and if done
>>>>>> smartly allows us to share more code (been there, done that).
>>>>>>
>>>>>> That satisfies me for keeping requests as a basic fence in the GPU
>>>>>> timeline and should keep everyone happy that the context can't vanish
>>>>>> until after it is complete. The only caveat is that we cannot evict the
>>>>>> most recent context. For legacy, we do a switch back to the always
>>>>>> pinned default context. For execlists we don't, but it still means we
>>>>>> should only have one context which cannot be evicted (like legacy). But
>>>>>> it does leave us with the issue that i915_gpu_idle() returns early and
>>>>>> i915_gem_context_fini() must keep the explicit gpu reset to be
>>>>>> absolutely sure that the pending context writes are completed before the
>>>>>> final context is unbound.
>>>>>
>>>>> Yes, and that was what I originally had in mind. Meanwhile the scheduler
>>>>> (will) happen and that means we won't have FIFO ordering. Which means when
>>>>> we switch contexts (as opposed to just adding more to the ringbuffer of
>>>>> the current one) we won't have any idea which context will be the next
>>>>> one. Which also means we don't know which request to pick to retire the
>>>>> old context. Hence why I think we need to be better.
>>>>
>>>> But the scheduler does - it is also in charge of making sure the
>>>> retirement queue is in order. The essence is that we only actually pin
>>>> engine->last_context, which is chosen as we submit stuff to the hw.
>>>
>>> Well I'm not sure how much it will reorder, but I'd expect it wants to
>>> reorder stuff pretty freely. And as soon as it reorders context (ofc they
>>> can't depend on each another) then the legacy hw ctx tracking won't work.
>>>
>>> I think at least ...
>>
>> Not the way it is written today, but the principle behind it still
>> stands. The last_context submitted to the hardware is pinned until a new
>> one is submitted (such that it remains bound in the GGTT until after the
>> context switch is complete due to the active reference). Instead of
>> doing the context tracking at the start of the execbuffer, the context
>> tracking needs to be pushed down to the submission backend/middleman.
>> -Chris
>
> Does anyone actually know what guarantees (if any) the GPU provides
> w.r.t access to context images vs. USER_INTERRUPTs and CSB-updated
> interrupts? Does 'active->idle' really mean that the context has been
> fully updated in memory (and can therefore be unmapped), or just that
> the engine has stopped processing (but the context might not be saved
> until it's known that it isn't going to be reactivated).
>
> For example, it could implement this:
>
> (End of last batch in current context)
> 1. Update seqno
> 2. Generate USER_INTERRUPT
> 3. Engine finishes work
> (HEAD == TAIL and no further contexts queued in ELSP)
> 4. Save all per-context registers to context image
> 5. Flush to memory and invalidate
> 6. Update CSB
> 7. Flush to memory
> 8. Generate CSB-update interrupt.
>
> (New batch in same context submitted via ELSP)
> 9. Reload entire context image from memory
> 10. Update CSB
> 11. Generate CSB-update interrupt
>
> Or this:
> 1. Update seqno
> 2. Generate USER_INTERRUPT
> 3. Engine finishes work
> (HEAD == TAIL and no further contexts queued in ELSP)
> 4. Update CSB
> 5. Generate CSB-update interrupt.
>
> (New batch in DIFFERENT context submitted via ELSP)
> 6. Save all per-context registers to old context image
> 7. Load entire context image from new image
> 8. Update CSB
> 9. Generate CSB-update interrupt
>
> The former is synchronous and relatively easy to model, the latter is
> more like the way legacy mode works. Any various other permutations are
> possible (sync save vs async save vs deferred save, full reload vs
> lite-restore, etc). So I think we either need to know what really
> happens (and assume future chips will work the same way), or make only
> minimal assumptions and code something that will work no matter how the
> hardware actually behaves. That probably precludes any attempt at
> tracking individual context-switches at the CSB level, which in any case
> aren't passed to the CPU in GuC submission mode.
>
> .Dave.
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Tracking context via last_context at request retirement.
In LRC/ELSP mode:
At startup:
- Double refcount default context
- Set last_context to default context
When a request is complete
- If last_context == current_context
- queue request for cleanup
- If last_context != current_context
- unref last_context
- update last_context to current_context
- queue request for cleanup
What this achieves:
Make the code path closer to legacy submission
Can now use active_list tracking for contexts & ringbufs
Additional work 1:
- When there is no work pending on an engine, at some point:
- Send a nop request on the default context
This moves last_context to be default context,
allowing previous last_context to be unref'd
Additional work 2:
- Change legacy mode to use last_context post request completion
This will allow us to unify the code paths.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-10-14 16:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
2015-10-07 16:03 ` Daniel Vetter
2015-10-07 16:05 ` Chris Wilson
2015-10-08 12:32 ` Chris Wilson
2015-10-09 7:58 ` Daniel Vetter
2015-10-09 8:36 ` Chris Wilson
2015-10-09 9:15 ` Daniel Vetter
2015-10-09 9:45 ` Chris Wilson
2015-10-09 17:18 ` Daniel Vetter
2015-10-09 17:23 ` Chris Wilson
2015-10-13 11:29 ` Daniel Vetter
2015-10-13 11:36 ` Chris Wilson
2015-10-14 14:42 ` Dave Gordon
2015-10-14 16:19 ` Nick Hoath [this message]
2015-10-06 14:52 ` [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects Nick Hoath
2015-10-07 16:05 ` Daniel Vetter
2015-10-08 13:35 ` Chris Wilson
2015-10-16 14:42 ` Nick Hoath
2015-10-19 9:48 ` Daniel Vetter
2015-10-19 10:54 ` Nick Hoath
2015-10-06 14:52 ` [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items Nick Hoath
2015-10-07 16:08 ` Daniel Vetter
2015-10-06 14:52 ` [PATCH 4/4] drm/i915: Only update ringbuf address when necessary Nick Hoath
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561E807F.7060105@intel.com \
--to=nicholas.hoath@intel.com \
--cc=daniel@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox