From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence
Date: Wed, 26 Apr 2017 13:13:41 +0100 [thread overview]
Message-ID: <7c70131c-e660-556c-e3d6-52fe3dca7ff3@linux.intel.com> (raw)
In-Reply-To: <20170426111849.GP11432@nuc-i3427.alporthouse.com>
On 26/04/2017 12:18, Chris Wilson wrote:
> On Wed, Apr 26, 2017 at 11:54:08AM +0100, Tvrtko Ursulin wrote:
>>>>> +/* struct intel_timeline_sync is a layer of a radixtree that maps a u64 fence
>>>>> + * context id to the last u32 fence seqno waited upon from that context.
>>>>> + * Unlike lib/radixtree it uses a parent pointer that allows traversal back to
>>>>> + * the root. This allows us to access the whole tree via a single pointer
>>>>> + * to the most recently used layer. We expect fence contexts to be dense
>>>>> + * and most reuse to be on the same i915_gem_context but on neighbouring
>>>>> + * engines (i.e. on adjacent contexts) and reuse the same leaf, a very
>>>>> + * effective lookup cache. If the new lookup is not on the same leaf, we
>>>>> + * expect it to be on the neighbouring branch.
>>>>> + *
>>>>> + * A leaf holds an array of u32 seqno, and has height 0. The bitmap field
>>>>> + * allows us to store whether a particular seqno is valid (i.e. allows us
>>>>> + * to distinguish unset from 0).
>>>>> + *
>>>>> + * A branch holds an array of layer pointers, and has height > 0, and always
>>>>> + * has at least 2 layers (either branches or leaves) below it.
>>>>> + *
>>>>> + */
>>>>
>>>> @_@ :)
>>>>
>>>> Ok, so a map of u64 to u32. We can't use IDR or radixtree directly
>>>> because of u64 keys. :( How about a hash table? It would be much
>>>> simpler to review. :) Seriously, if it would perform close enough it
>>>> would be a much much simpler implementation.
>>>
>>> You want a resizable hashtable. rht is appallingly slow, so you want a
>>> custom resizeable ht. They are not as simple as this codewise ;)
>>> (Plus a compressed radixtree is part of my plan for scalability
>>> improvements for struct reservation_object.)
>>
>> Why resizable? I was thinking a normal one. If at any given time we
>> have an active set of contexts, or at least lookups are as you say
>> below, to neighbouring contexts, that would mean we are talking
>> about lookups to different hash buckets. And for the typical
>> working set we would expect many collisions so longer lists in each
>> bucket? So maybe NUM_ENGINES * some typical load constant number
>> buckets would not be that bad?
>
> Consider a long running display server that will accumulate 10,000s of
> thousands of clients in its lifetime, each with their own contents that
> get shared by passing around fences/framebuffers. (Or on a shorter scale
> any of the context stress tests in igt.) Due to the non-recycling of the
> context ids, we can grow to very large tables - but we have no knowledge
> of what contexts are no longer required.
>
> To compensate we need to occasionally prune the sync points. For a ht we
> could just scrap it, For an idr, we could store last use and delete
> stale leaves.
Hm, pruning yes.. but you don't have pruning in this patch either. So
that's something which needs to be addressed either way.
> But first we have a question of how many buckets do we give the static
> ht? Most processes will be sharing between 2 contexts (render context,
> presentation context) except for a display server who may have 10-100 of
> clients - and possibly where eliminating repeated syncs is going to be
> most valuable. That suggests 256 buckets for every timeline (assuming
> just a pair of engines across shared contexts). Overkill for the
> majority, and going to be miserable in stress tests.
>
> Finally what do you store in the ht? Fences are the obvious candidate,
> but need reaping. Or you just create a ht for the request and only
> squash repeated fences inside a single request - that doesn't benefit
> from timeline tracking and squashing between requests (but does avoid
> keeping fences around forever). Hence why I went with tracking seqno. To
> avoid allocations for the ht nodes, we could create an open-addressed ht
> with the {context, seqno} embedded in it. It would be efficient, but
> needs online resizing and is a fair chunk of new code.
I was thinking of exactly the same thing as this patch does, u64 context
id as key, u32 seqnos (wrapped in a container with hlist_node).
Ok, so the key is pruning to keep the display server scenario in check.
Free the key from i915_fence_release?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-04-26 12:13 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-19 9:41 Confluence of eb + timeline improvements Chris Wilson
2017-04-19 9:41 ` [PATCH 01/27] drm/i915/selftests: Allocate inode/file dynamically Chris Wilson
2017-04-20 7:42 ` Joonas Lahtinen
2017-04-19 9:41 ` [PATCH 02/27] drm/i915: Mark CPU cache as dirty on every transition for CPU writes Chris Wilson
2017-04-19 16:52 ` Dongwon Kim
2017-04-19 17:15 ` Chris Wilson
2017-04-19 17:46 ` Chris Wilson
2017-04-19 18:08 ` Chris Wilson
2017-04-19 18:13 ` Dongwon Kim
2017-04-19 18:26 ` Chris Wilson
2017-04-19 20:30 ` Dongwon Kim
2017-04-19 20:49 ` Dongwon Kim
2017-04-19 9:41 ` [PATCH 03/27] drm/i915: Mark up clflushes as belonging to an unordered timeline Chris Wilson
2017-04-19 9:41 ` [PATCH 04/27] drm/i915: Lift timeline ordering to await_dma_fence Chris Wilson
2017-04-19 9:41 ` [PATCH 05/27] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
2017-04-19 9:41 ` [PATCH 06/27] drm/i915: Redefine ptr_pack_bits() and friends Chris Wilson
2017-04-19 9:41 ` [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence Chris Wilson
2017-04-24 13:03 ` Tvrtko Ursulin
2017-04-24 13:19 ` Chris Wilson
2017-04-24 13:31 ` Chris Wilson
2017-04-26 10:20 ` Tvrtko Ursulin
2017-04-26 10:38 ` Chris Wilson
2017-04-26 10:54 ` Tvrtko Ursulin
2017-04-26 11:18 ` Chris Wilson
2017-04-26 12:13 ` Tvrtko Ursulin [this message]
2017-04-26 12:23 ` Chris Wilson
2017-04-26 14:36 ` Tvrtko Ursulin
2017-04-26 14:55 ` Chris Wilson
2017-04-26 15:04 ` Chris Wilson
2017-04-26 18:56 ` Chris Wilson
2017-04-26 22:22 ` Chris Wilson
2017-04-27 9:20 ` Tvrtko Ursulin
2017-04-27 9:47 ` Chris Wilson
2017-04-27 7:06 ` [PATCH v8] " Chris Wilson
2017-04-27 7:14 ` Chris Wilson
2017-04-27 9:50 ` Chris Wilson
2017-04-27 11:42 ` Chris Wilson
2017-04-27 11:48 ` [PATCH v9] " Chris Wilson
2017-04-27 16:47 ` Tvrtko Ursulin
2017-04-27 17:25 ` Chris Wilson
2017-04-27 20:34 ` Chris Wilson
2017-04-27 20:53 ` Chris Wilson
2017-04-28 7:41 ` [PATCH v10] " Chris Wilson
2017-04-28 7:59 ` Chris Wilson
2017-04-28 9:32 ` Tvrtko Ursulin
2017-04-28 9:54 ` Chris Wilson
2017-04-28 9:55 ` Tvrtko Ursulin
2017-04-28 10:11 ` Chris Wilson
2017-04-28 14:12 ` [PATCH v13] " Chris Wilson
2017-04-28 19:02 ` [PATCH v14] " Chris Wilson
2017-05-02 12:24 ` Tvrtko Ursulin
2017-05-02 14:45 ` Chris Wilson
2017-05-02 15:11 ` Chris Wilson
2017-05-02 15:17 ` Tvrtko Ursulin
2017-05-02 14:50 ` Chris Wilson
2017-04-19 9:41 ` [PATCH 08/27] drm/i915: Rename intel_timeline.sync_seqno[] to .global_sync[] Chris Wilson
2017-04-19 9:41 ` [PATCH 09/27] drm/i915: Confirm the request is still active before adding it to the await Chris Wilson
2017-04-19 9:41 ` [PATCH 10/27] drm/i915: Do not record a successful syncpoint for a dma-await Chris Wilson
2017-04-19 9:41 ` [PATCH 11/27] drm/i915: Switch the global i915.semaphores check to a local predicate Chris Wilson
2017-04-19 9:41 ` [PATCH 12/27] drm/i915: Only report a wakeup if the waiter was truly asleep Chris Wilson
2017-04-20 13:30 ` Tvrtko Ursulin
2017-04-20 13:57 ` Chris Wilson
2017-04-19 9:41 ` [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request Chris Wilson
2017-04-20 14:58 ` Tvrtko Ursulin
2017-04-27 14:37 ` Chris Wilson
2017-04-28 12:02 ` Tvrtko Ursulin
2017-04-28 12:21 ` Chris Wilson
2017-04-19 9:41 ` [PATCH 14/27] drm/i915: Don't mark an execlists context-switch when idle Chris Wilson
2017-04-20 8:53 ` Joonas Lahtinen
2017-04-19 9:41 ` [PATCH 15/27] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
2017-04-24 10:28 ` Tvrtko Ursulin
2017-04-24 11:07 ` Chris Wilson
2017-04-24 12:18 ` Chris Wilson
2017-04-24 12:44 ` Tvrtko Ursulin
2017-04-24 13:06 ` Chris Wilson
2017-04-19 9:41 ` [PATCH 16/27] drm/i915: Reinstate reservation_object zapping for batch_pool objects Chris Wilson
2017-04-28 12:20 ` Tvrtko Ursulin
2017-04-19 9:41 ` [PATCH 17/27] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2017-04-19 9:41 ` [PATCH 18/27] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2017-04-19 9:41 ` [PATCH 19/27] drm/i915: Split vma exec_link/evict_link Chris Wilson
2017-04-19 9:41 ` [PATCH 20/27] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2017-04-19 9:41 ` [PATCH 21/27] drm/i915: Pass vma to relocate entry Chris Wilson
2017-04-19 9:41 ` [PATCH 22/27] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2017-04-20 8:49 ` Joonas Lahtinen
2017-04-19 9:41 ` [PATCH 23/27] drm/i915: First try the previous execbuffer location Chris Wilson
2017-04-19 9:41 ` [PATCH 24/27] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2017-04-19 9:41 ` [PATCH 25/27] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2017-04-19 9:41 ` [PATCH 26/27] drm/i915: Async GPU relocation processing Chris Wilson
2017-04-19 9:41 ` [PATCH 27/27] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2017-04-19 10:09 ` Chris Wilson
2017-04-19 11:07 ` Tvrtko Ursulin
2017-04-19 10:01 ` ✗ Fi.CI.BAT: failure for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically Patchwork
2017-04-27 7:27 ` ✓ Fi.CI.BAT: success for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically (rev2) Patchwork
2017-04-28 14:31 ` ✓ Fi.CI.BAT: success for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically (rev5) Patchwork
2017-04-28 19:22 ` ✓ Fi.CI.BAT: success for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically (rev6) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7c70131c-e660-556c-e3d6-52fe3dca7ff3@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox