public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	Intel-GFX@Lists.FreeDesktop.Org,
	Matthew Brost <matthew.brost@intel.com>
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH 02/12] drm/i915/guc: Don't call ring_is_idle in GuC submission
Date: Tue, 19 Jul 2022 10:49:42 +0100	[thread overview]
Message-ID: <0107930e-bb8b-c7ee-c03c-7e7abf564498@linux.intel.com> (raw)
In-Reply-To: <4f14835e-300e-a1b7-bebb-8ecbb07ab682@intel.com>


On 19/07/2022 01:09, John Harrison wrote:
> On 7/18/2022 05:26, Tvrtko Ursulin wrote:
>>
>> On 13/07/2022 00:31, John.C.Harrison@Intel.com wrote:
>>> From: Matthew Brost <matthew.brost@intel.com>
>>>
>>> The engine registers really shouldn't be touched during GuC submission
>>> as the GuC owns the registers. Don't call ring_is_idle and tie
>>
>> Touch being just read and it is somehow harmful?
> The registers are meaningless when GuC is controlling the submission. 
> The i915 driver has no knowledge of what context is or is not executing 
> on any given engine at any given time. So reading reading the ring 
> registers is incorrect - it can lead to bad assumptions about what state 
> the hardware is in.

Same is actually true with the execlists backend. The code in 
ring_is_idle is not concerning itself with which context is running or 
not. Just that the head/tail/ctl appear idle.

Problem/motivation appears to be on a higher than simply ring registers.

I am not claiming it makes sense with Guc and that it has to remain but 
just suggesting for as a minimum clearer commit message.

>>> intel_engine_is_idle strictly to the engine pm.
>>
>> Strictly seems wrong - it is just ring_is_idle check that is replaced 
>> and not the whole implementation of intel_engine_is_idle.
>>
>>> Because intel_engine_is_idle tied to the engine pm, retire requests
>>> before checking intel_engines_are_idle in gt_drop_caches, and lastly
>> Why is re-ordering important? I at least can't understand it. I hope 
>> it's not working around IGT failures.
> If requests are physically completed but not retired then they will be 
> holding unnecessary PM references. So we need to flush those out before 
> checking for idle.

And if they are not as someone passes in DROP_RESET_ACTIVE? They will 
not retire and code still enters intel_engines_are_idle so that has to 
work, no? Something does not align for me still.

>>> increase the timeout in gt_drop_caches for the intel_engines_are_idle
>>> check.
>>
>> Same here - why?
> @Matthew Brost - do you recall which particular tests were hitting an 
> issue? I'm guessing gem_ctx_create? I believe that's the one that 
> creates and destroys thousands of contexts. That is much slower with GuC 
> (GuC communication required) than with execlists (i915 internal state 
> change only).

And if that is a logically separate change please split the patch up.

Regards,

Tvrtko

> 
> John.
> 
> 
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
>>>   drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
>>>   drivers/gpu/drm/i915/i915_drv.h           |  2 +-
>>>   3 files changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index 283870c659911..959a7c92e8f4d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1602,6 +1602,9 @@ static bool ring_is_idle(struct intel_engine_cs 
>>> *engine)
>>>   {
>>>       bool idle = true;
>>>   +    /* GuC submission shouldn't access HEAD & TAIL via MMIO */
>>> +    GEM_BUG_ON(intel_engine_uses_guc(engine));
>>> +
>>>       if (I915_SELFTEST_ONLY(!engine->mmio_base))
>>>           return true;
>>>   @@ -1668,6 +1671,16 @@ bool intel_engine_is_idle(struct 
>>> intel_engine_cs *engine)
>>>       if (!i915_sched_engine_is_empty(engine->sched_engine))
>>>           return false;
>>>   +    /*
>>> +     * We shouldn't touch engine registers with GuC submission as 
>>> the GuC
>>> +     * owns the registers. Let's tie the idle to engine pm, at worst 
>>> this
>>> +     * function sometimes will falsely report non-idle when idle 
>>> during the
>>> +     * delay to retire requests or with virtual engines and a request
>>> +     * running on another instance within the same class / submit mask.
>>> +     */
>>> +    if (intel_engine_uses_guc(engine))
>>> +        return false;
>>> +
>>>       /* Ring stopped? */
>>>       return ring_is_idle(engine);
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>>> b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 94e5c29d2ee3a..ee5334840e9cb 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -654,13 +654,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>>>   {
>>>       int ret;
>>>   +    if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
>>> +        intel_gt_retire_requests(gt);
>>> +
>>>       if (val & DROP_RESET_ACTIVE &&
>>>           wait_for(intel_engines_are_idle(gt), 
>>> I915_IDLE_ENGINES_TIMEOUT))
>>>           intel_gt_set_wedged(gt);
>>>   -    if (val & DROP_RETIRE)
>>> -        intel_gt_retire_requests(gt);
>>> -
>>>       if (val & (DROP_IDLE | DROP_ACTIVE)) {
>>>           ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
>>>           if (ret)
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index c22f29c3faa0e..53c7474dde495 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -278,7 +278,7 @@ struct i915_gem_mm {
>>>       u32 shrink_count;
>>>   };
>>>   -#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
>>> +#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
>>>     unsigned long i915_fence_context_timeout(const struct 
>>> drm_i915_private *i915,
>>>                        u64 context);
> 

  reply	other threads:[~2022-07-19  9:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 23:31 [Intel-gfx] [PATCH 00/12] Random assortment of (mostly) GuC related patches John.C.Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 01/12] drm/i915: Remove bogus GEM_BUG_ON in unpark John.C.Harrison
2022-07-18 12:15   ` Tvrtko Ursulin
2022-07-19  0:05     ` John Harrison
2022-07-19  9:42       ` Tvrtko Ursulin
2022-07-21  0:54         ` John Harrison
2022-07-21  9:24           ` Tvrtko Ursulin
2022-07-22 19:09             ` John Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 02/12] drm/i915/guc: Don't call ring_is_idle in GuC submission John.C.Harrison
2022-07-18 12:26   ` Tvrtko Ursulin
2022-07-19  0:09     ` John Harrison
2022-07-19  9:49       ` Tvrtko Ursulin [this message]
2022-07-19 10:14         ` Tvrtko Ursulin
2022-07-12 23:31 ` [Intel-gfx] [PATCH 03/12] drm/i915/guc: Fix issues with live_preempt_cancel John.C.Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 04/12] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 05/12] drm/i915/guc: Record CTB info in error logs John.C.Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 06/12] drm/i915/guc: Use streaming loads to speed up dumping the guc log John.C.Harrison
2022-07-22 20:05   ` John Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 07/12] drm/i915/guc: Route semaphores to GuC for Gen12+ John.C.Harrison
2022-07-13  0:51   ` Matthew Brost
2022-07-15 17:21   ` Ceraolo Spurio, Daniele
2022-07-12 23:31 ` [Intel-gfx] [PATCH 08/12] drm/i915/guc: Add selftest for a hung GuC John.C.Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 09/12] drm/i915/selftest: Cope with not having an RCS engine John.C.Harrison
2022-07-13  0:48   ` Matthew Brost
2022-07-12 23:31 ` [Intel-gfx] [PATCH 10/12] drm/i915/guc: Support larger contexts on newer hardware John.C.Harrison
2022-07-18 12:35   ` Tvrtko Ursulin
2022-07-19  0:13     ` John Harrison
2022-07-19  9:56       ` Tvrtko Ursulin
2022-07-22 19:32         ` John Harrison
2022-07-25 11:24           ` Tvrtko Ursulin
2022-07-12 23:31 ` [Intel-gfx] [PATCH 11/12] drm/i915/guc: Don't abort on CTB_UNUSED status John.C.Harrison
2022-07-18 12:36   ` Tvrtko Ursulin
2022-07-19  0:16     ` John Harrison
2022-07-12 23:31 ` [Intel-gfx] [PATCH 12/12] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
2022-07-13  0:46   ` Matthew Brost
2022-07-13  0:31 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning for Random assortment of (mostly) GuC related patches Patchwork
2022-07-13 20:09 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Random assortment of (mostly) GuC related patches (rev2) Patchwork
2022-07-13 20:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-07-14  1:41 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0107930e-bb8b-c7ee-c03c-7e7abf564498@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=john.c.harrison@intel.com \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox