Re: [PATCH 20/27] drm/i915: Remove logical HW ID

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 20/27] drm/i915: Remove logical HW ID
Date: Wed, 25 Sep 2019 15:38:59 +0100	[thread overview]
Message-ID: <a32e4f08-eb84-7807-5aa7-e2e4e3b5fdf8@linux.intel.com> (raw)
In-Reply-To: <156941589825.4979.9378610445541130440@skylake-alporthouse-com>

On 25/09/2019 13:51, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-25 13:41:10)
>> [+ Daniele, I think he might want to have a look at this.]
>>
>> On 25/09/2019 11:01, Chris Wilson wrote:
>>> With the introduction of ctx->engines[] we allow multiple logical
>>> contexts to be used on the same engine (e.g. with virtual engines). Each
>>> logical context requires a unique tag in order for context-switching to
>>> occur correctly between them.
>>>
>>> We only need to keep a unique tag for the active lifetime of the
>>> context, and for as long as we need to identify that context. The HW
>>> uses the tag to determine if it should use a lite-restore (why not the
>>> LRCA?) and passes the tag back for various status identifies. The only
>>> status we need to track is for OA, so when using perf, we assign the
>>> specific context a unique tag.
>>>
>>> Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")
>>
>> Fixes? Why?
> 
> The above paragraph explains that as we give two distinct contexts the
> same tag then it will only perform (according to bspec) a lite-restore.

Right I see. But not with virtual engines per se, but even if someone 
created an engine map with two same engines in it. Virtual engines are 
then a super set of this problem.

>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> index da4b45aa84b1..7bce176e696c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> @@ -296,10 +296,12 @@ struct intel_engine_cs {
>>>        u8 uabi_class;
>>>        u8 uabi_instance;
>>>    
>>> +     u32 uabi_capabilities;
>>>        u32 context_size;
>>>        u32 mmio_base;
>>>    
>>> -     u32 uabi_capabilities;
>>> +     unsigned int context_tag;
>>> +#define NUM_CONTEXT_TAG (256)
>>
>> AFAICT this define now defines how deep ELSP we can have before we start
>> getting hard to debug problems. GuC angle needs considering here.
> 
> The guc is immaterial here, since they use their own mechanism. It just
> needs to be a value larger than 2 * EXECLIST_MAX_PORTS and less then
> BIT(10).

I don't know how GuC (will) work so hopefully Daniele can confirm or 
deny this won't be a problem there.

>   
>>> @@ -978,6 +965,15 @@ __execlists_schedule_in(struct i915_request *rq)
>>>    
>>>        intel_context_get(ce);
>>>    
>>> +     if (ce->tag) {
>>> +             ce->lrc_desc |= (u64)ce->tag << 32;
>>> +     } else {
>>> +             ce->lrc_desc &= ~GENMASK_ULL(47, 37);
>>> +             ce->lrc_desc |=
>>> +                     (u64)(engine->context_tag++ % NUM_CONTEXT_TAG) <<
>>> +                     GEN11_SW_CTX_ID_SHIFT;
>>> +     }
>>
>> So hw id is valid only while context is in ELSP and it changes with
>> every submission except in the OA case?
> 
> Aye. OA being a pita.
>   
>> Shifts and masks look dodgy. For >= gen11 the current code has the hw_id
>> in [37, 42], otherwise [32, 52]. This patch does not seem to handle the
>> differences between gens.
> 
> Because the values we supply do not matter (they are cookies), just the
> lifetime.

Okay so you are choosing to not use some of the available bits since we 
don't need all 20 (pre gen11), or even 10 (gen11+). This definitely 
deserves a comment. And another assert with NUM_CONTEXT_TAG relating to 
available bits.

> 
>>> +
>>>        intel_gt_pm_get(engine->gt);
>>>        execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>>>        intel_engine_context_in(engine);
>>> @@ -2224,7 +2220,6 @@ static void execlists_context_unpin(struct intel_context *ce)
>>>        check_redzone((void *)ce->lrc_reg_state - LRC_STATE_PN * PAGE_SIZE,
>>>                      ce->engine);
>>>    
>>> -     i915_gem_context_unpin_hw_id(ce->gem_context);
>>>        i915_gem_object_unpin_map(ce->state->obj);
>>>        intel_ring_reset(ce->ring, ce->ring->tail);
>>>    }
>>> @@ -2274,18 +2269,12 @@ __execlists_context_pin(struct intel_context *ce,
>>>                goto unpin_active;
>>>        }
>>>    
>>> -     ret = i915_gem_context_pin_hw_id(ce->gem_context);
>>> -     if (ret)
>>> -             goto unpin_map;
>>> -
>>>        ce->lrc_desc = lrc_descriptor(ce, engine);
>>>        ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>>>        __execlists_update_reg_state(ce, engine);
>>>    
>>>        return 0;
>>>    
>>> -unpin_map:
>>> -     i915_gem_object_unpin_map(ce->state->obj);
>>>    unpin_active:
>>>        intel_context_active_release(ce);
>>>    err:
>>> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> index 343d79c1cb7e..04a5a0d90823 100644
>>> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> @@ -1564,27 +1564,10 @@ vgpu_id_show(struct device *dev, struct device_attribute *attr,
>>>        return sprintf(buf, "\n");
>>>    }
>>>    
>>> -static ssize_t
>>> -hw_id_show(struct device *dev, struct device_attribute *attr,
>>> -        char *buf)
>>> -{
>>> -     struct mdev_device *mdev = mdev_from_dev(dev);
>>> -
>>> -     if (mdev) {
>>> -             struct intel_vgpu *vgpu = (struct intel_vgpu *)
>>> -                     mdev_get_drvdata(mdev);
>>> -             return sprintf(buf, "%u\n",
>>> -                            vgpu->submission.shadow[0]->gem_context->hw_id);
>>> -     }
>>> -     return sprintf(buf, "\n");
>>> -}
>>> -
>>>    static DEVICE_ATTR_RO(vgpu_id);
>>> -static DEVICE_ATTR_RO(hw_id);
>>>    
>>>    static struct attribute *intel_vgpu_attrs[] = {
>>>        &dev_attr_vgpu_id.attr,
>>> -     &dev_attr_hw_id.attr,
>>>        NULL
>>>    };
>>>    
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index fd41505d52ec..8caaa446490f 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -1504,9 +1504,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
>>>                struct intel_context *ce;
>>>    
>>>                seq_puts(m, "HW context ");
>>> -             if (!list_empty(&ctx->hw_id_link))
>>> -                     seq_printf(m, "%x [pin %u]", ctx->hw_id,
>>> -                                atomic_read(&ctx->hw_id_pin_count));
>>>                if (ctx->pid) {
>>>                        struct task_struct *task;
>>>    
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index a86d28bda6dd..47239df653f2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -471,9 +471,9 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
>>>                                const char *header,
>>>                                const struct drm_i915_error_context *ctx)
>>>    {
>>> -     err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
>>> -                header, ctx->comm, ctx->pid, ctx->hw_id,
>>> -                ctx->sched_attr.priority, ctx->guilty, ctx->active);
>>> +     err_printf(m, "%s%s[%d] prio %d, guilty %d active %d\n",
>>> +                header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
>>> +                ctx->guilty, ctx->active);
>>>    }
>>>    
>>>    static void error_print_engine(struct drm_i915_error_state_buf *m,
>>> @@ -1262,7 +1262,6 @@ static bool record_context(struct drm_i915_error_context *e,
>>>                rcu_read_unlock();
>>>        }
>>>    
>>> -     e->hw_id = ctx->hw_id;
>>>        e->sched_attr = ctx->sched;
>>>        e->guilty = atomic_read(&ctx->guilty_count);
>>>        e->active = atomic_read(&ctx->active_count);
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>>> index caaed5093d95..4dc36d6ee3a2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>>> @@ -117,7 +117,6 @@ struct i915_gpu_state {
>>>                struct drm_i915_error_context {
>>>                        char comm[TASK_COMM_LEN];
>>>                        pid_t pid;
>>> -                     u32 hw_id;
>>>                        int active;
>>>                        int guilty;
>>>                        struct i915_sched_attr sched_attr;
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>> index 80055501eccb..d36ba248d438 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1283,22 +1283,15 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>>                } else {
>>>                        stream->specific_ctx_id_mask =
>>>                                (1U << GEN8_CTX_ID_WIDTH) - 1;
>>> -                     stream->specific_ctx_id =
>>> -                             upper_32_bits(ce->lrc_desc);
>>> -                     stream->specific_ctx_id &=
>>> -                             stream->specific_ctx_id_mask;
>>> +                     stream->specific_ctx_id = stream->specific_ctx_id_mask;
>>>                }
>>>                break;
>>>    
>>>        case 11:
>>>        case 12: {
>>>                stream->specific_ctx_id_mask =
>>> -                     ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32) |
>>> -                     ((1U << GEN11_ENGINE_INSTANCE_WIDTH) - 1) << (GEN11_ENGINE_INSTANCE_SHIFT - 32) |
>>> -                     ((1 << GEN11_ENGINE_CLASS_WIDTH) - 1) << (GEN11_ENGINE_CLASS_SHIFT - 32);
>>> -             stream->specific_ctx_id = upper_32_bits(ce->lrc_desc);
>>> -             stream->specific_ctx_id &=
>>> -                     stream->specific_ctx_id_mask;
>>> +                     ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
>>> +             stream->specific_ctx_id = stream->specific_ctx_id_mask;
>>>                break;
>>>        }
>>>    
>>> @@ -1306,6 +1299,8 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>>                MISSING_CASE(INTEL_GEN(i915));
>>>        }
>>>    
>>> +     ce->tag = stream->specific_ctx_id_mask;
>>> +
>>>        DRM_DEBUG_DRIVER("filtering on ctx_id=0x%x ctx_id_mask=0x%x\n",
>>>                         stream->specific_ctx_id,
>>>                         stream->specific_ctx_id_mask);
>>> @@ -1324,12 +1319,14 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>>>    {
>>>        struct intel_context *ce;
>>>    
>>> -     stream->specific_ctx_id = INVALID_CTX_ID;
>>> -     stream->specific_ctx_id_mask = 0;
>>> -
>>>        ce = fetch_and_zero(&stream->pinned_ctx);
>>> -     if (ce)
>>> +     if (ce) {
>>> +             ce->tag = 0;
>>>                intel_context_unpin(ce);
>>> +     }
>>> +
>>> +     stream->specific_ctx_id = INVALID_CTX_ID;
>>> +     stream->specific_ctx_id_mask = 0;
>>
>> OA hunks looks dodgy as well. ce->tag is set to
>> stream->specific_ctx_id_mask while I think it should be id. Furthermore
>> id is assigned from mask which is fixed and not unique for different
>> contexts.
> 
> It is singular in the entire system. The mask is a guaranteed unique
> value.

What is singular? The OA context and it's id? I assumed what needs to be 
done is fix hw_id for _all_ contexts while OA is sampling. But on a more 
detailed look this assumption does not seem to hold in code. Okay, I 
declare to not know how OA works well enough.

> 
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> index 1c9db08f7c28..ac1ff558eb90 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> @@ -170,7 +170,7 @@ static int igt_vma_create(void *arg)
>>>                }
>>>    
>>>                nc = 0;
>>> -             for_each_prime_number(num_ctx, MAX_CONTEXT_HW_ID) {
>>> +             for_each_prime_number(num_ctx, 2 * NUM_CONTEXT_TAG) {
>>>                        for (; nc < num_ctx; nc++) {
>>>                                ctx = mock_context(i915, "mock");
>>>                                if (!ctx)
>>>
>>
>> Unless I am missing something I see this patch as simplification and not
>> a fix. If we can get away with it in the GuC world then it's quite a big
>> simplification so it's fine by me.
> 
> Bspec says that for lrc_desc with matching tags (at least for gen8-gen11
> I believe), it performs a lite restore (regardless of lrca). Ergo it is
> quite a major fix.

Yes I see that now, my bad. This one will not lend itself well to 
backporting.. But I don't think there is a lighter-weight solution. 
Moving hw_id to intel_context would also end up large-ish I think. 
Because of the locking requirement etc.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx