Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence
Date: Wed, 26 Apr 2017 13:13:41 +0100	[thread overview]
Message-ID: <7c70131c-e660-556c-e3d6-52fe3dca7ff3@linux.intel.com> (raw)
In-Reply-To: <20170426111849.GP11432@nuc-i3427.alporthouse.com>


On 26/04/2017 12:18, Chris Wilson wrote:
> On Wed, Apr 26, 2017 at 11:54:08AM +0100, Tvrtko Ursulin wrote:
>>>>> +/* struct intel_timeline_sync is a layer of a radixtree that maps a u64 fence
>>>>> + * context id to the last u32 fence seqno waited upon from that context.
>>>>> + * Unlike lib/radixtree it uses a parent pointer that allows traversal back to
>>>>> + * the root. This allows us to access the whole tree via a single pointer
>>>>> + * to the most recently used layer. We expect fence contexts to be dense
>>>>> + * and most reuse to be on the same i915_gem_context but on neighbouring
>>>>> + * engines (i.e. on adjacent contexts) and reuse the same leaf, a very
>>>>> + * effective lookup cache. If the new lookup is not on the same leaf, we
>>>>> + * expect it to be on the neighbouring branch.
>>>>> + *
>>>>> + * A leaf holds an array of u32 seqno, and has height 0. The bitmap field
>>>>> + * allows us to store whether a particular seqno is valid (i.e. allows us
>>>>> + * to distinguish unset from 0).
>>>>> + *
>>>>> + * A branch holds an array of layer pointers, and has height > 0, and always
>>>>> + * has at least 2 layers (either branches or leaves) below it.
>>>>> + *
>>>>> + */
>>>>
>>>> @_@ :)
>>>>
>>>> Ok, so a map of u64 to u32. We can't use IDR or radixtree directly
>>>> because of u64 keys. :( How about a hash table? It would be much
>>>> simpler to review. :) Seriously, if it would perform close enough it
>>>> would be a much much simpler implementation.
>>>
>>> You want a resizable hashtable. rht is appallingly slow, so you want a
>>> custom resizeable ht. They are not as simple as this codewise ;)
>>> (Plus a compressed radixtree is part of my plan for scalability
>>> improvements for struct reservation_object.)
>>
>> Why resizable? I was thinking a normal one. If at any given time we
>> have an active set of contexts, or at least lookups are as you say
>> below, to neighbouring contexts, that would mean we are talking
>> about lookups to different hash buckets.  And for the typical
>> working set we would expect many collisions so longer lists in each
>> bucket? So maybe NUM_ENGINES * some typical load constant number
>> buckets would not be that bad?
>
> Consider a long running display server that will accumulate 10,000s of
> thousands of clients in its lifetime, each with their own contents that
> get shared by passing around fences/framebuffers. (Or on a shorter scale
> any of the context stress tests in igt.) Due to the non-recycling of the
> context ids, we can grow to very large tables - but we have no knowledge
> of what contexts are no longer required.
>
> To compensate we need to occasionally prune the sync points. For a ht we
> could just scrap it, For an idr, we could store last use and delete
> stale leaves.

Hm, pruning yes.. but you don't have pruning in this patch either. So 
that's something which needs to be addressed either way.

> But first we have a question of how many buckets do we give the static
> ht? Most processes will be sharing between 2 contexts (render context,
> presentation context) except for a display server who may have 10-100 of
> clients - and possibly where eliminating repeated syncs is going to be
> most valuable. That suggests 256 buckets for every timeline (assuming
> just a pair of engines across shared contexts). Overkill for the
> majority, and going to be miserable in stress tests.
>
> Finally what do you store in the ht? Fences are the obvious candidate,
> but need reaping. Or you just create a ht for the request and only
> squash repeated fences inside a single request - that doesn't benefit
> from timeline tracking and squashing between requests (but does avoid
> keeping fences around forever). Hence why I went with tracking seqno. To
> avoid allocations for the ht nodes, we could create an open-addressed ht
> with the {context, seqno} embedded in it. It would be efficient, but
> needs online resizing and is a fair chunk of new code.

I was thinking of exactly the same thing as this patch does, u64 context 
id as key, u32 seqnos (wrapped in a container with hlist_node).

Ok, so the key is pruning to keep the display server scenario in check.

Free the key from i915_fence_release?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-04-26 12:13 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-19  9:41 Confluence of eb + timeline improvements Chris Wilson
2017-04-19  9:41 ` [PATCH 01/27] drm/i915/selftests: Allocate inode/file dynamically Chris Wilson
2017-04-20  7:42   ` Joonas Lahtinen
2017-04-19  9:41 ` [PATCH 02/27] drm/i915: Mark CPU cache as dirty on every transition for CPU writes Chris Wilson
2017-04-19 16:52   ` Dongwon Kim
2017-04-19 17:15     ` Chris Wilson
2017-04-19 17:46     ` Chris Wilson
2017-04-19 18:08     ` Chris Wilson
2017-04-19 18:13       ` Dongwon Kim
2017-04-19 18:26         ` Chris Wilson
2017-04-19 20:30           ` Dongwon Kim
2017-04-19 20:49           ` Dongwon Kim
2017-04-19  9:41 ` [PATCH 03/27] drm/i915: Mark up clflushes as belonging to an unordered timeline Chris Wilson
2017-04-19  9:41 ` [PATCH 04/27] drm/i915: Lift timeline ordering to await_dma_fence Chris Wilson
2017-04-19  9:41 ` [PATCH 05/27] drm/i915: Make ptr_unpack_bits() more function-like Chris Wilson
2017-04-19  9:41 ` [PATCH 06/27] drm/i915: Redefine ptr_pack_bits() and friends Chris Wilson
2017-04-19  9:41 ` [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence Chris Wilson
2017-04-24 13:03   ` Tvrtko Ursulin
2017-04-24 13:19     ` Chris Wilson
2017-04-24 13:31       ` Chris Wilson
2017-04-26 10:20   ` Tvrtko Ursulin
2017-04-26 10:38     ` Chris Wilson
2017-04-26 10:54       ` Tvrtko Ursulin
2017-04-26 11:18         ` Chris Wilson
2017-04-26 12:13           ` Tvrtko Ursulin [this message]
2017-04-26 12:23             ` Chris Wilson
2017-04-26 14:36               ` Tvrtko Ursulin
2017-04-26 14:55                 ` Chris Wilson
2017-04-26 15:04                 ` Chris Wilson
2017-04-26 18:56             ` Chris Wilson
2017-04-26 22:22               ` Chris Wilson
2017-04-27  9:20                 ` Tvrtko Ursulin
2017-04-27  9:47                   ` Chris Wilson
2017-04-27  7:06   ` [PATCH v8] " Chris Wilson
2017-04-27  7:14     ` Chris Wilson
2017-04-27  9:50     ` Chris Wilson
2017-04-27 11:42       ` Chris Wilson
2017-04-27 11:48     ` [PATCH v9] " Chris Wilson
2017-04-27 16:47       ` Tvrtko Ursulin
2017-04-27 17:25         ` Chris Wilson
2017-04-27 20:34           ` Chris Wilson
2017-04-27 20:53             ` Chris Wilson
2017-04-28  7:41       ` [PATCH v10] " Chris Wilson
2017-04-28  7:59         ` Chris Wilson
2017-04-28  9:32         ` Tvrtko Ursulin
2017-04-28  9:54           ` Chris Wilson
2017-04-28  9:55         ` Tvrtko Ursulin
2017-04-28 10:11           ` Chris Wilson
2017-04-28 14:12         ` [PATCH v13] " Chris Wilson
2017-04-28 19:02           ` [PATCH v14] " Chris Wilson
2017-05-02 12:24             ` Tvrtko Ursulin
2017-05-02 14:45               ` Chris Wilson
2017-05-02 15:11                 ` Chris Wilson
2017-05-02 15:17                 ` Tvrtko Ursulin
2017-05-02 14:50               ` Chris Wilson
2017-04-19  9:41 ` [PATCH 08/27] drm/i915: Rename intel_timeline.sync_seqno[] to .global_sync[] Chris Wilson
2017-04-19  9:41 ` [PATCH 09/27] drm/i915: Confirm the request is still active before adding it to the await Chris Wilson
2017-04-19  9:41 ` [PATCH 10/27] drm/i915: Do not record a successful syncpoint for a dma-await Chris Wilson
2017-04-19  9:41 ` [PATCH 11/27] drm/i915: Switch the global i915.semaphores check to a local predicate Chris Wilson
2017-04-19  9:41 ` [PATCH 12/27] drm/i915: Only report a wakeup if the waiter was truly asleep Chris Wilson
2017-04-20 13:30   ` Tvrtko Ursulin
2017-04-20 13:57     ` Chris Wilson
2017-04-19  9:41 ` [PATCH 13/27] drm/i915/execlists: Pack the count into the low bits of the port.request Chris Wilson
2017-04-20 14:58   ` Tvrtko Ursulin
2017-04-27 14:37     ` Chris Wilson
2017-04-28 12:02       ` Tvrtko Ursulin
2017-04-28 12:21         ` Chris Wilson
2017-04-19  9:41 ` [PATCH 14/27] drm/i915: Don't mark an execlists context-switch when idle Chris Wilson
2017-04-20  8:53   ` Joonas Lahtinen
2017-04-19  9:41 ` [PATCH 15/27] drm/i915: Split execlist priority queue into rbtree + linked list Chris Wilson
2017-04-24 10:28   ` Tvrtko Ursulin
2017-04-24 11:07     ` Chris Wilson
2017-04-24 12:18       ` Chris Wilson
2017-04-24 12:44       ` Tvrtko Ursulin
2017-04-24 13:06         ` Chris Wilson
2017-04-19  9:41 ` [PATCH 16/27] drm/i915: Reinstate reservation_object zapping for batch_pool objects Chris Wilson
2017-04-28 12:20   ` Tvrtko Ursulin
2017-04-19  9:41 ` [PATCH 17/27] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2017-04-19  9:41 ` [PATCH 18/27] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2017-04-19  9:41 ` [PATCH 19/27] drm/i915: Split vma exec_link/evict_link Chris Wilson
2017-04-19  9:41 ` [PATCH 20/27] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2017-04-19  9:41 ` [PATCH 21/27] drm/i915: Pass vma to relocate entry Chris Wilson
2017-04-19  9:41 ` [PATCH 22/27] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2017-04-20  8:49   ` Joonas Lahtinen
2017-04-19  9:41 ` [PATCH 23/27] drm/i915: First try the previous execbuffer location Chris Wilson
2017-04-19  9:41 ` [PATCH 24/27] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2017-04-19  9:41 ` [PATCH 25/27] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2017-04-19  9:41 ` [PATCH 26/27] drm/i915: Async GPU relocation processing Chris Wilson
2017-04-19  9:41 ` [PATCH 27/27] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2017-04-19 10:09   ` Chris Wilson
2017-04-19 11:07     ` Tvrtko Ursulin
2017-04-19 10:01 ` ✗ Fi.CI.BAT: failure for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically Patchwork
2017-04-27  7:27 ` ✓ Fi.CI.BAT: success for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically (rev2) Patchwork
2017-04-28 14:31 ` ✓ Fi.CI.BAT: success for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically (rev5) Patchwork
2017-04-28 19:22 ` ✓ Fi.CI.BAT: success for series starting with [01/27] drm/i915/selftests: Allocate inode/file dynamically (rev6) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c70131c-e660-556c-e3d6-52fe3dca7ff3@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox