From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 20/27] drm/i915: Remove logical HW ID
Date: Wed, 25 Sep 2019 15:38:59 +0100 [thread overview]
Message-ID: <a32e4f08-eb84-7807-5aa7-e2e4e3b5fdf8@linux.intel.com> (raw)
In-Reply-To: <156941589825.4979.9378610445541130440@skylake-alporthouse-com>
On 25/09/2019 13:51, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-25 13:41:10)
>> [+ Daniele, I think he might want to have a look at this.]
>>
>> On 25/09/2019 11:01, Chris Wilson wrote:
>>> With the introduction of ctx->engines[] we allow multiple logical
>>> contexts to be used on the same engine (e.g. with virtual engines). Each
>>> logical context requires a unique tag in order for context-switching to
>>> occur correctly between them.
>>>
>>> We only need to keep a unique tag for the active lifetime of the
>>> context, and for as long as we need to identify that context. The HW
>>> uses the tag to determine if it should use a lite-restore (why not the
>>> LRCA?) and passes the tag back for various status identifies. The only
>>> status we need to track is for OA, so when using perf, we assign the
>>> specific context a unique tag.
>>>
>>> Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")
>>
>> Fixes? Why?
>
> The above paragraph explains that as we give two distinct contexts the
> same tag then it will only perform (according to bspec) a lite-restore.
Right I see. But not with virtual engines per se, but even if someone
created an engine map with two same engines in it. Virtual engines are
then a super set of this problem.
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> index da4b45aa84b1..7bce176e696c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> @@ -296,10 +296,12 @@ struct intel_engine_cs {
>>> u8 uabi_class;
>>> u8 uabi_instance;
>>>
>>> + u32 uabi_capabilities;
>>> u32 context_size;
>>> u32 mmio_base;
>>>
>>> - u32 uabi_capabilities;
>>> + unsigned int context_tag;
>>> +#define NUM_CONTEXT_TAG (256)
>>
>> AFAICT this define now defines how deep ELSP we can have before we start
>> getting hard to debug problems. GuC angle needs considering here.
>
> The guc is immaterial here, since they use their own mechanism. It just
> needs to be a value larger than 2 * EXECLIST_MAX_PORTS and less then
> BIT(10).
I don't know how GuC (will) work so hopefully Daniele can confirm or
deny this won't be a problem there.
>
>>> @@ -978,6 +965,15 @@ __execlists_schedule_in(struct i915_request *rq)
>>>
>>> intel_context_get(ce);
>>>
>>> + if (ce->tag) {
>>> + ce->lrc_desc |= (u64)ce->tag << 32;
>>> + } else {
>>> + ce->lrc_desc &= ~GENMASK_ULL(47, 37);
>>> + ce->lrc_desc |=
>>> + (u64)(engine->context_tag++ % NUM_CONTEXT_TAG) <<
>>> + GEN11_SW_CTX_ID_SHIFT;
>>> + }
>>
>> So hw id is valid only while context is in ELSP and it changes with
>> every submission except in the OA case?
>
> Aye. OA being a pita.
>
>> Shifts and masks look dodgy. For >= gen11 the current code has the hw_id
>> in [37, 42], otherwise [32, 52]. This patch does not seem to handle the
>> differences between gens.
>
> Because the values we supply do not matter (they are cookies), just the
> lifetime.
Okay so you are choosing to not use some of the available bits since we
don't need all 20 (pre gen11), or even 10 (gen11+). This definitely
deserves a comment. And another assert with NUM_CONTEXT_TAG relating to
available bits.
>
>>> +
>>> intel_gt_pm_get(engine->gt);
>>> execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>>> intel_engine_context_in(engine);
>>> @@ -2224,7 +2220,6 @@ static void execlists_context_unpin(struct intel_context *ce)
>>> check_redzone((void *)ce->lrc_reg_state - LRC_STATE_PN * PAGE_SIZE,
>>> ce->engine);
>>>
>>> - i915_gem_context_unpin_hw_id(ce->gem_context);
>>> i915_gem_object_unpin_map(ce->state->obj);
>>> intel_ring_reset(ce->ring, ce->ring->tail);
>>> }
>>> @@ -2274,18 +2269,12 @@ __execlists_context_pin(struct intel_context *ce,
>>> goto unpin_active;
>>> }
>>>
>>> - ret = i915_gem_context_pin_hw_id(ce->gem_context);
>>> - if (ret)
>>> - goto unpin_map;
>>> -
>>> ce->lrc_desc = lrc_descriptor(ce, engine);
>>> ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>>> __execlists_update_reg_state(ce, engine);
>>>
>>> return 0;
>>>
>>> -unpin_map:
>>> - i915_gem_object_unpin_map(ce->state->obj);
>>> unpin_active:
>>> intel_context_active_release(ce);
>>> err:
>>> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> index 343d79c1cb7e..04a5a0d90823 100644
>>> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> @@ -1564,27 +1564,10 @@ vgpu_id_show(struct device *dev, struct device_attribute *attr,
>>> return sprintf(buf, "\n");
>>> }
>>>
>>> -static ssize_t
>>> -hw_id_show(struct device *dev, struct device_attribute *attr,
>>> - char *buf)
>>> -{
>>> - struct mdev_device *mdev = mdev_from_dev(dev);
>>> -
>>> - if (mdev) {
>>> - struct intel_vgpu *vgpu = (struct intel_vgpu *)
>>> - mdev_get_drvdata(mdev);
>>> - return sprintf(buf, "%u\n",
>>> - vgpu->submission.shadow[0]->gem_context->hw_id);
>>> - }
>>> - return sprintf(buf, "\n");
>>> -}
>>> -
>>> static DEVICE_ATTR_RO(vgpu_id);
>>> -static DEVICE_ATTR_RO(hw_id);
>>>
>>> static struct attribute *intel_vgpu_attrs[] = {
>>> &dev_attr_vgpu_id.attr,
>>> - &dev_attr_hw_id.attr,
>>> NULL
>>> };
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index fd41505d52ec..8caaa446490f 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -1504,9 +1504,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
>>> struct intel_context *ce;
>>>
>>> seq_puts(m, "HW context ");
>>> - if (!list_empty(&ctx->hw_id_link))
>>> - seq_printf(m, "%x [pin %u]", ctx->hw_id,
>>> - atomic_read(&ctx->hw_id_pin_count));
>>> if (ctx->pid) {
>>> struct task_struct *task;
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index a86d28bda6dd..47239df653f2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -471,9 +471,9 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
>>> const char *header,
>>> const struct drm_i915_error_context *ctx)
>>> {
>>> - err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
>>> - header, ctx->comm, ctx->pid, ctx->hw_id,
>>> - ctx->sched_attr.priority, ctx->guilty, ctx->active);
>>> + err_printf(m, "%s%s[%d] prio %d, guilty %d active %d\n",
>>> + header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
>>> + ctx->guilty, ctx->active);
>>> }
>>>
>>> static void error_print_engine(struct drm_i915_error_state_buf *m,
>>> @@ -1262,7 +1262,6 @@ static bool record_context(struct drm_i915_error_context *e,
>>> rcu_read_unlock();
>>> }
>>>
>>> - e->hw_id = ctx->hw_id;
>>> e->sched_attr = ctx->sched;
>>> e->guilty = atomic_read(&ctx->guilty_count);
>>> e->active = atomic_read(&ctx->active_count);
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>>> index caaed5093d95..4dc36d6ee3a2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>>> @@ -117,7 +117,6 @@ struct i915_gpu_state {
>>> struct drm_i915_error_context {
>>> char comm[TASK_COMM_LEN];
>>> pid_t pid;
>>> - u32 hw_id;
>>> int active;
>>> int guilty;
>>> struct i915_sched_attr sched_attr;
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>> index 80055501eccb..d36ba248d438 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1283,22 +1283,15 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>> } else {
>>> stream->specific_ctx_id_mask =
>>> (1U << GEN8_CTX_ID_WIDTH) - 1;
>>> - stream->specific_ctx_id =
>>> - upper_32_bits(ce->lrc_desc);
>>> - stream->specific_ctx_id &=
>>> - stream->specific_ctx_id_mask;
>>> + stream->specific_ctx_id = stream->specific_ctx_id_mask;
>>> }
>>> break;
>>>
>>> case 11:
>>> case 12: {
>>> stream->specific_ctx_id_mask =
>>> - ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32) |
>>> - ((1U << GEN11_ENGINE_INSTANCE_WIDTH) - 1) << (GEN11_ENGINE_INSTANCE_SHIFT - 32) |
>>> - ((1 << GEN11_ENGINE_CLASS_WIDTH) - 1) << (GEN11_ENGINE_CLASS_SHIFT - 32);
>>> - stream->specific_ctx_id = upper_32_bits(ce->lrc_desc);
>>> - stream->specific_ctx_id &=
>>> - stream->specific_ctx_id_mask;
>>> + ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
>>> + stream->specific_ctx_id = stream->specific_ctx_id_mask;
>>> break;
>>> }
>>>
>>> @@ -1306,6 +1299,8 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>> MISSING_CASE(INTEL_GEN(i915));
>>> }
>>>
>>> + ce->tag = stream->specific_ctx_id_mask;
>>> +
>>> DRM_DEBUG_DRIVER("filtering on ctx_id=0x%x ctx_id_mask=0x%x\n",
>>> stream->specific_ctx_id,
>>> stream->specific_ctx_id_mask);
>>> @@ -1324,12 +1319,14 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>>> {
>>> struct intel_context *ce;
>>>
>>> - stream->specific_ctx_id = INVALID_CTX_ID;
>>> - stream->specific_ctx_id_mask = 0;
>>> -
>>> ce = fetch_and_zero(&stream->pinned_ctx);
>>> - if (ce)
>>> + if (ce) {
>>> + ce->tag = 0;
>>> intel_context_unpin(ce);
>>> + }
>>> +
>>> + stream->specific_ctx_id = INVALID_CTX_ID;
>>> + stream->specific_ctx_id_mask = 0;
>>
>> OA hunks looks dodgy as well. ce->tag is set to
>> stream->specific_ctx_id_mask while I think it should be id. Furthermore
>> id is assigned from mask which is fixed and not unique for different
>> contexts.
>
> It is singular in the entire system. The mask is a guaranteed unique
> value.
What is singular? The OA context and it's id? I assumed what needs to be
done is fix hw_id for _all_ contexts while OA is sampling. But on a more
detailed look this assumption does not seem to hold in code. Okay, I
declare to not know how OA works well enough.
>
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> index 1c9db08f7c28..ac1ff558eb90 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> @@ -170,7 +170,7 @@ static int igt_vma_create(void *arg)
>>> }
>>>
>>> nc = 0;
>>> - for_each_prime_number(num_ctx, MAX_CONTEXT_HW_ID) {
>>> + for_each_prime_number(num_ctx, 2 * NUM_CONTEXT_TAG) {
>>> for (; nc < num_ctx; nc++) {
>>> ctx = mock_context(i915, "mock");
>>> if (!ctx)
>>>
>>
>> Unless I am missing something I see this patch as simplification and not
>> a fix. If we can get away with it in the GuC world then it's quite a big
>> simplification so it's fine by me.
>
> Bspec says that for lrc_desc with matching tags (at least for gen8-gen11
> I believe), it performs a lite restore (regardless of lrca). Ergo it is
> quite a major fix.
Yes I see that now, my bad. This one will not lend itself well to
backporting.. But I don't think there is a lighter-weight solution.
Moving hw_id to intel_context would also end up large-ish I think.
Because of the locking requirement etc.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-09-25 14:39 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-25 10:01 struct_mutex is over the hill and far away Chris Wilson
2019-09-25 10:01 ` [PATCH 01/27] dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling) Chris Wilson
2019-09-25 10:01 ` [PATCH 02/27] drm/mm: Pack allocated/scanned boolean into a bitfield Chris Wilson
2019-09-25 10:01 ` [PATCH 03/27] drm/i915: Only track bound elements of the GTT Chris Wilson
2019-09-25 10:01 ` [PATCH 04/27] drm/i915: Mark up address spaces that may need to allocate Chris Wilson
2019-09-25 10:01 ` [PATCH 05/27] drm/i915: Pull i915_vma_pin under the vm->mutex Chris Wilson
2019-09-27 8:47 ` Tvrtko Ursulin
2019-09-27 11:06 ` Chris Wilson
2019-09-25 10:01 ` [PATCH 06/27] drm/i915: Push the i915_active.retire into a worker Chris Wilson
2019-09-25 10:01 ` [PATCH 07/27] drm/i915: Coordinate i915_active with its own mutex Chris Wilson
2019-09-27 11:10 ` Tvrtko Ursulin
2019-09-27 11:25 ` Chris Wilson
2019-09-27 12:08 ` Tvrtko Ursulin
2019-09-27 12:16 ` Chris Wilson
2019-09-27 12:25 ` Tvrtko Ursulin
2019-09-27 12:32 ` Chris Wilson
2019-09-27 13:58 ` Tvrtko Ursulin
2019-09-27 14:10 ` Chris Wilson
2019-09-25 10:01 ` [PATCH 08/27] drm/i915: Move idle barrier cleanup into engine-pm Chris Wilson
2019-09-25 10:01 ` [PATCH 09/27] drm/i915: Drop struct_mutex from around i915_retire_requests() Chris Wilson
2019-09-25 10:01 ` [PATCH 10/27] drm/i915: Remove the GEM idle worker Chris Wilson
2019-09-25 10:01 ` [PATCH 11/27] drm/i915: Merge wait_for_timelines with retire_request Chris Wilson
2019-09-25 10:47 ` Tvrtko Ursulin
2019-09-25 10:54 ` Chris Wilson
2019-09-25 10:01 ` [PATCH 12/27] drm/i915: Move request runtime management onto gt Chris Wilson
2019-09-25 10:57 ` Tvrtko Ursulin
2019-09-25 11:17 ` Chris Wilson
2019-09-25 11:24 ` Chris Wilson
2019-09-25 11:29 ` Chris Wilson
2019-09-25 11:33 ` Chris Wilson
2019-09-25 15:17 ` Tvrtko Ursulin
2019-09-25 10:01 ` [PATCH 13/27] drm/i915: Move global activity tracking from GEM to GT Chris Wilson
2019-09-25 10:01 ` [PATCH 14/27] drm/i915: Expose engine properties via sysfs Chris Wilson
2019-09-27 20:48 ` Rodrigo Vivi
2019-09-25 10:01 ` [PATCH 15/27] drm/i915/execlists: Force preemption Chris Wilson
2019-09-25 10:01 ` [PATCH 16/27] drm/i915: Mark up "sentinel" requests Chris Wilson
2019-09-25 10:01 ` [PATCH 17/27] drm/i915/execlists: Cancel banned contexts on schedule-out Chris Wilson
2019-09-25 10:01 ` [PATCH 18/27] drm/i915: Cancel non-persistent contexts on close Chris Wilson
2019-09-25 10:01 ` [PATCH 19/27] drm/i915: Replace hangcheck by heartbeats Chris Wilson
2019-09-27 8:26 ` Joonas Lahtinen
2019-09-27 9:18 ` Chris Wilson
2019-09-25 10:01 ` [PATCH 20/27] drm/i915: Remove logical HW ID Chris Wilson
2019-09-25 12:41 ` Tvrtko Ursulin
2019-09-25 12:51 ` Chris Wilson
2019-09-25 14:38 ` Tvrtko Ursulin [this message]
2019-09-25 17:59 ` Daniele Ceraolo Spurio
2019-09-25 18:23 ` Matthew Brost
2019-09-25 10:01 ` [PATCH 21/27] drm/i915: Move context management under GEM Chris Wilson
2019-09-26 13:57 ` Tvrtko Ursulin
2019-10-02 16:09 ` Tvrtko Ursulin
2019-10-03 7:35 ` Chris Wilson
2019-09-25 10:01 ` [PATCH 22/27] drm/i915/overlay: Drop struct_mutex guard Chris Wilson
2019-09-25 12:43 ` Tvrtko Ursulin
2019-09-25 12:53 ` Chris Wilson
2019-09-25 13:01 ` Tvrtko Ursulin
2019-09-25 13:11 ` Chris Wilson
2019-09-25 10:01 ` [PATCH 23/27] drm/i915: Drop struct_mutex guard from debugfs/framebuffer_info Chris Wilson
2019-09-25 12:45 ` Tvrtko Ursulin
2019-09-25 10:01 ` [PATCH 24/27] drm/i915: Remove struct_mutex guard for debugfs/opregion Chris Wilson
2019-09-25 12:51 ` Tvrtko Ursulin
2019-09-25 10:01 ` [PATCH 25/27] drm/i915: Drop struct_mutex from suspend state save/restore Chris Wilson
2019-09-25 12:52 ` Tvrtko Ursulin
2019-09-25 10:01 ` [PATCH 26/27] drm/i915/selftests: Drop vestigal struct_mutex guards Chris Wilson
2019-09-25 12:55 ` Tvrtko Ursulin
2019-09-25 10:01 ` [PATCH 27/27] drm/i915: Drop struct_mutex from around GEM initialisation Chris Wilson
2019-09-25 12:56 ` Tvrtko Ursulin
2019-09-25 10:21 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/27] dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling) Patchwork
2019-09-25 10:34 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-09-25 10:51 ` ✓ Fi.CI.BAT: success " Patchwork
2019-09-26 0:56 ` ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a32e4f08-eb84-7807-5aa7-e2e4e3b5fdf8@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.