From: John Harrison <John.C.Harrison@Intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 43/46] drm/i915: Allocate a status page for each timeline
Date: Wed, 16 Jan 2019 13:06:36 -0800 [thread overview]
Message-ID: <8892d2cc-bd4c-8658-d855-5ba6b9235e61@Intel.com> (raw)
In-Reply-To: <154757781911.30063.11406468337612552966@skylake-alporthouse-com>
On 1/15/2019 10:43, Chris Wilson wrote:
> Quoting John Harrison (2019-01-15 18:17:21)
>> On 1/15/2019 01:50, Chris Wilson wrote:
>>> Quoting John Harrison (2019-01-15 00:56:13)
>>>> On 1/7/2019 03:55, Chris Wilson wrote:
>>>>> +static int alloc_hwsp(struct i915_timeline *timeline)
>>>>> +{
>>>>> + struct drm_i915_private *i915 = timeline->i915;
>>>>> + struct i915_vma *vma;
>>>>> + int offset;
>>>>> +
>>>>> + mutex_lock(&i915->gt.timeline_lock);
>>>>> +
>>>>> +restart:
>>>>> + offset = find_first_cacheline(i915);
>>>>> + if (offset == NBITS && i915->gt.timeline_hwsp) {
>>>>> + i915_vma_put(i915->gt.timeline_hwsp);
>>>>> + i915->gt.timeline_hwsp = NULL;
>>>>> + }
>>>>> +
>>>>> + vma = i915->gt.timeline_hwsp;
>>>>> + if (!vma) {
>>>>> + struct drm_i915_gem_object *bo;
>>>>> +
>>>>> + /* Drop the lock before allocations */
>>>>> + mutex_unlock(&i915->gt.timeline_lock);
>>>>> +
>>>>> + BUILD_BUG_ON(NBITS * CACHELINE_BYTES > PAGE_SIZE);
>>>>> + bo = i915_gem_object_create_internal(i915, PAGE_SIZE);
>>>>> + if (IS_ERR(bo))
>>>>> + return PTR_ERR(bo);
>>>>> +
>>>>> + i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
>>>>> +
>>>>> + vma = i915_vma_instance(bo, &i915->ggtt.vm, NULL);
>>>>> + if (IS_ERR(vma))
>>>>> + return PTR_ERR(vma);
>>>>> +
>>>>> + mutex_lock(&i915->gt.timeline_lock);
>>>>> + if (i915->gt.timeline_hwsp) {
>>>>> + i915_gem_object_put(bo);
>>>>> + goto restart;
>>>>> + }
>>>>> +
>>>>> + i915->gt.timeline_hwsp = vma;
>>>>> + i915->gt.timeline_free = ~0ull;
>>>>> + offset = 0;
>>>>> + }
>>>>> +
>>>>> + i915->gt.timeline_free &= ~BIT_ULL(offset);
>>>>> +
>>>>> + timeline->hwsp_ggtt = i915_vma_get(vma);
>>>>> + timeline->hwsp_offset = offset * CACHELINE_BYTES;
>>>>> +
>>>>> + mutex_unlock(&i915->gt.timeline_lock);
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>> If I'm reading this correctly then gt.timeline_hwsp/free is the a cached
>>>> copy of the most recently allocated but not yet filled bank of seqno
>>>> locations. When it gets full, the i915->gt reference gets dropped and a
>>>> new page is allocated and used up line by line. Meanwhile, each timeline
>>>> has it's own private reference to the page so dropping the i915->gt
>>>> reference is safe. And once the last timeline using a given page is
>>>> freed, the last reference to that page will be dropped and so the page
>>>> itself will also be freed. If a timeline is freed before the currently
>>>> cached page is filled, then that timeline's slot will be released and
>>>> re-used by the next timeline to be created.
>>>>
>>>> But what about the scenario of a long running system with a small but
>>>> growing number of persistent tasks interspersed with many short lived
>>>> tasks? In that case, you would end up with many sparsely populated pages
>>>> that whose free slots will not get re-used. You could have a linked list
>>>> of cached pages. When a page is filled, move it to a 'full' list. When a
>>>> timeline is freed, if it's page was on the 'full' list, clear the slot
>>>> and move it back to the 'available' list.
>>> Yes. My thinking was a plain slab cache was a quick-and-dirty
>>> improvement over a page-per-timeline. And a freelist would be the next
>>> step.
>>>
>>>> Or is the idea that a worst case of a single page vma allocation per
>>>> timeline is the least of our worries if there is an ever growing number
>>>> of timelines/contexts/users in the system?
>>> Nah, it was just an attempt to quickly reduce the number of allocations,
>>> where the worst case of one page+vma per timeline was the starting
>>> point.
>>>
>>> We should break this patch down into 1) one-page-per-timeline, 2) slab
>>> cache, 3) free list 4) profit.
>>>
>>> At other times we have been wanting to be able to suballocate pages,
>>> something to keep in mind would be extending this to arbitrary cacheline
>>> allocations.
>> The multi-stage approach sounds good. Keep things simple in this patch
>> and then improve the situation later. One thing to be careful of with a
>> cacheline allocator would be make sure whatever is being converted
>> wasn't using full pages for security reasons. I.e. a page can be private
>> to a process, a cacheline will be shared by many. I guess that would
>> only really apply to allocations being passed to user land as the kernel
>> is considered secure? Or can a user batch buffer write to arbitrary
>> locations within the ppHWSP and thereby splat someone else's seqno?
> ppHWSP, yes. But for internal allocations, only accessible via the ring
> + GGTT, should be no problem. I agree that we definitely don't want to
> expose subpage sharing across the userspace boundary (all isolation
> controls are only on pages and above).
>
> If userspace wants suballocations, it can (and does) do them for itself
> and should regulate its own sharing.
I'm a little confused. Are you saying that a rogue batch buffer could
splat some other context's ppHWSP seqno or that it can't? It would be
bad if one dodgy user could cause hangchecks in another user's batch by
splatting their seqnos.
>
>>>>> + if (global_hwsp) {
>>>>> + timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
>>>>> + timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>>>>> + } else {
>>>>> + err = alloc_hwsp(timeline);
>>>>> + if (err)
>>>>> + return err;
>>>>> + }
>>>>> +
>>>>> + vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
>>>>> + if (IS_ERR(vaddr)) { /* leak the cacheline, but will clean up later */
>>>> Can you explain this comment more? Where/when is the later?
>>> On failure here, the cacheline is still marked as allocated in the slab,
>>> but the reference to the page is released. So the backing page will be
>>> released when everyone else finally drops their reference.
>>>
>>> Just laziness, since we have the ability to return the cacheline later
>>> on...
>> Meaning the actual leak is the bit in 'i915->gt.timeline_free' that says
>> this cacheline can or can't be used for the next allocation? Presumably
>> you could do the bit map munging in the case that 'global_hwsp' is null,
>> but the code would certainly be messier for not a lot of gain.
> Having been pointed out that I was being lazy, a bit of refactoring
> later showed how lazy I was.
Does that mean you are going to re-work this patch or follow it up with
a subsequent one?
>
>>>>> @@ -2616,7 +2628,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>>>> goto error_deref_obj;
>>>>> }
>>>>>
>>>>> - timeline = i915_timeline_create(ctx->i915, ctx->name);
>>>>> + timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
>>>> Why does this use the global HWSP rather than a per context one?
>>> .global_hwsp = NULL => it allocates its own HWSP.
>>>
>>> Were you thinking of intel_engine_setup_common() which is still using
>>> the global HWSP at this point in time?
>> Doh. Brain fart. Presumably the engine one will disappear completely? Or
>> is it still needed for legacy mode?
> It (the timeline embedded inside the engine) is killed later, once
> the internal clients (perf/pmu, hangcheck and idling at the last count)
> are ready for the lack of globally ordered execution queue. The single
> ringbuffer + timeline persists for legacy. (Multiple timelines for gen7,
> coming later!)
> -Chris
Sounds good :).
John.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-01-16 21:06 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-07 11:54 [PATCH 01/46] drm/i915: Return immediately if trylock fails for direct-reclaim Chris Wilson
2019-01-07 11:54 ` [PATCH 02/46] drm/i915: Report the number of closed vma held by each context in debugfs Chris Wilson
2019-01-07 12:35 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 03/46] drm/i915: Track all held rpm wakerefs Chris Wilson
2019-01-07 13:14 ` Mika Kuoppala
2019-01-07 13:22 ` Chris Wilson
2019-01-08 11:45 ` [PATCH v2] " Chris Wilson
2019-01-08 12:22 ` [PATCH v3] " Chris Wilson
2019-01-08 12:49 ` Mika Kuoppala
2019-01-08 20:05 ` kbuild test robot
2019-01-07 11:54 ` [PATCH 04/46] drm/i915: Markup paired operations on wakerefs Chris Wilson
2019-01-08 16:23 ` Mika Kuoppala
2019-01-08 16:41 ` Chris Wilson
2019-01-09 9:23 ` Mika Kuoppala
2019-01-09 11:51 ` Chris Wilson
2019-01-09 23:33 ` John Harrison
2019-01-07 11:54 ` [PATCH 05/46] drm/i915: Track GT wakeref Chris Wilson
2019-01-09 9:52 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 06/46] drm/i915: Track the rpm wakerefs for error handling Chris Wilson
2019-01-09 10:12 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 07/46] drm/i915: Mark up sysfs with rpm wakeref tracking Chris Wilson
2019-01-09 10:13 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 08/46] drm/i915: Mark up debugfs " Chris Wilson
2019-01-09 10:20 ` Mika Kuoppala
2019-01-09 11:49 ` Chris Wilson
2019-01-07 11:54 ` [PATCH 09/46] drm/i915/perf: Track the rpm wakeref Chris Wilson
2019-01-09 10:30 ` Mika Kuoppala
2019-01-09 11:45 ` Chris Wilson
2019-01-07 11:54 ` [PATCH 10/46] drm/i915/pmu: Track " Chris Wilson
2019-01-09 10:37 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 11/46] drm/i915/guc: Track the " Chris Wilson
2019-01-09 10:53 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 12/46] drm/i915/gem: Track the rpm wakerefs Chris Wilson
2019-01-09 11:16 ` Mika Kuoppala
2019-01-09 23:45 ` John Harrison
2019-01-07 11:54 ` [PATCH 13/46] drm/i915/fb: Track " Chris Wilson
2019-01-09 11:39 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 14/46] drm/i915/hotplug: Track temporary rpm wakeref Chris Wilson
2019-01-09 11:40 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 15/46] drm/i915/panel: " Chris Wilson
2019-01-09 11:41 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 16/46] drm/i915/selftests: Mark up rpm wakerefs Chris Wilson
2019-01-09 12:54 ` Mika Kuoppala
2019-01-07 11:54 ` [PATCH 17/46] drm/i915: Syntatic sugar for using intel_runtime_pm Chris Wilson
2019-01-09 14:30 ` Mika Kuoppala
2019-01-10 0:24 ` John Harrison
2019-01-10 1:10 ` John Harrison
2019-01-10 9:59 ` Chris Wilson
2019-01-07 11:54 ` [PATCH 18/46] drm/i915: Markup paired operations on display power domains Chris Wilson
2019-01-10 0:55 ` John Harrison
2019-01-10 10:00 ` Chris Wilson
2019-01-07 11:54 ` [PATCH 19/46] drm/i915: Track the wakeref used to initialise " Chris Wilson
2019-01-07 11:54 ` [PATCH 20/46] drm/i915: Combined gt.awake/gt.power wakerefs Chris Wilson
2019-01-07 11:54 ` [PATCH 21/46] drm/i915/dp: Markup pps lock power well Chris Wilson
2019-01-07 11:54 ` [PATCH 22/46] drm/i915: Complain if hsw_get_pipe_config acquires the same power well twice Chris Wilson
2019-01-07 11:54 ` [PATCH 23/46] drm/i915: Mark up Ironlake ips with rpm wakerefs Chris Wilson
2019-01-07 11:54 ` [PATCH 24/46] drm/i915: Serialise concurrent calls to i915_gem_set_wedged() Chris Wilson
2019-01-07 11:54 ` [PATCH 25/46] drm/i915: Differentiate between ggtt->mutex and ppgtt->mutex Chris Wilson
2019-01-08 9:00 ` Tvrtko Ursulin
2019-01-07 11:54 ` [PATCH 26/46] drm/i915: Pull all the reset functionality together into i915_reset.c Chris Wilson
2019-01-07 11:54 ` [PATCH 27/46] drm/i915: Make all GPU resets atomic Chris Wilson
2019-01-07 11:54 ` [PATCH 28/46] drm/i915/guc: Disable global reset Chris Wilson
2019-01-07 11:54 ` [PATCH 29/46] drm/i915: Remove GPU reset dependence on struct_mutex Chris Wilson
2019-01-07 11:54 ` [PATCH 30/46] drm/i915: Issue engine resets onto idle engines Chris Wilson
2019-01-07 11:54 ` [PATCH 31/46] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
2019-01-16 16:27 ` Tvrtko Ursulin
2019-01-16 16:37 ` Chris Wilson
2019-01-07 11:54 ` [PATCH 32/46] drm/i915: Pull VM lists under the VM mutex Chris Wilson
2019-01-16 16:47 ` Tvrtko Ursulin
2019-01-16 17:01 ` Chris Wilson
2019-01-17 16:23 ` Tvrtko Ursulin
2019-01-17 23:20 ` Chris Wilson
2019-01-07 11:54 ` [PATCH 33/46] drm/i915: Move vma lookup to its own lock Chris Wilson
2019-01-07 11:54 ` [PATCH 34/46] drm/i915: Move intel_execlists_show_requests() aside Chris Wilson
2019-01-07 11:54 ` [PATCH 35/46] drm/i915: Use b->irq_enable() as predicate for mock engine Chris Wilson
2019-01-07 11:54 ` [PATCH 36/46] drm/i915/selftests: Allocate mock ring/timeline per context Chris Wilson
2019-01-07 11:55 ` [PATCH 37/46] drm/i915/selftests: Make evict tolerant of foreign objects Chris Wilson
2019-01-07 11:55 ` [PATCH 38/46] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
2019-01-07 11:55 ` [PATCH 39/46] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
2019-01-10 10:52 ` Matthew Auld
2019-01-10 11:07 ` Chris Wilson
2019-01-10 11:24 ` Matthew Auld
2019-01-07 11:55 ` [PATCH 40/46] drm/i915: Move list of timelines under its own lock Chris Wilson
2019-01-07 11:55 ` [PATCH 41/46] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
2019-01-15 0:55 ` John Harrison
2019-01-15 9:14 ` Chris Wilson
2019-01-15 15:40 ` Chris Wilson
2019-01-15 17:56 ` John Harrison
2019-01-07 11:55 ` [PATCH 42/46] drm/i915: Enlarge vma->pin_count Chris Wilson
2019-01-15 19:57 ` John Harrison
2019-01-15 20:17 ` Chris Wilson
2019-01-16 0:18 ` John Harrison
2019-01-07 11:55 ` [PATCH 43/46] drm/i915: Allocate a status page for each timeline Chris Wilson
2019-01-15 0:56 ` John Harrison
2019-01-15 9:50 ` Chris Wilson
2019-01-15 18:17 ` John Harrison
2019-01-15 18:43 ` Chris Wilson
2019-01-16 21:06 ` John Harrison [this message]
2019-01-16 21:15 ` Chris Wilson
2019-01-07 11:55 ` [PATCH 44/46] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
2019-01-07 11:55 ` [PATCH 45/46] drm/i915: Identify active requests Chris Wilson
2019-01-07 11:55 ` [PATCH 46/46] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
2019-01-07 12:45 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Return immediately if trylock fails for direct-reclaim Patchwork
2019-01-07 13:02 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-01-07 13:05 ` ✓ Fi.CI.BAT: success " Patchwork
2019-01-07 14:00 ` [PATCH 01/46] " Tvrtko Ursulin
2019-01-07 14:07 ` Chris Wilson
2019-01-08 8:58 ` Tvrtko Ursulin
2019-01-07 17:10 ` ✗ Fi.CI.IGT: failure for series starting with [01/46] " Patchwork
2019-01-07 17:19 ` Chris Wilson
2019-01-08 13:50 ` ✗ Fi.CI.BAT: failure for series starting with [01/46] drm/i915: Return immediately if trylock fails for direct-reclaim (rev3) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8892d2cc-bd4c-8658-d855-5ba6b9235e61@Intel.com \
--to=john.c.harrison@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.