From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
intel-gfx@lists.freedesktop.org
Cc: Owen Zhang <owen.zhang@intel.com>
Subject: Re: [PATCH] drm/i915: fix SFC reset flow
Date: Wed, 18 Sep 2019 14:42:37 +0100 [thread overview]
Message-ID: <3538e930-0ac3-af8d-74e5-a48036282a96@linux.intel.com> (raw)
In-Reply-To: <5c0958f8-be9e-bfb9-fc41-223423a25d5e@intel.com>
On 17/09/2019 19:29, Daniele Ceraolo Spurio wrote:
>
>
> On 9/17/2019 3:22 AM, Tvrtko Ursulin wrote:
>>
>> On 16/09/2019 22:41, Daniele Ceraolo Spurio wrote:
>>> Our assumption that the we can ask the HW to lock the SFC even if not
>>> currently in use does not match the HW commitment. The expectation from
>>> the HW is that SW will not try to lock the SFC if the engine is not
>>> using it and if we do that the behavior is undefined; on ICL the HW
>>> ends up to returning the ack and ignoring our lock request, but this is
>>> not guaranteed and we shouldn't expect it going forward.
>>>
>>> Reported-by: Owen Zhang <owen.zhang@intel.com>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gt/intel_reset.c | 25 +++++++++++++++++--------
>>> 1 file changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c
>>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> index 8327220ac558..900958804bd5 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>>> @@ -352,13 +352,15 @@ static u32 gen11_lock_sfc(struct
>>> intel_engine_cs *engine)
>>> }
>>> /*
>>> - * Tell the engine that a software reset is going to happen. The
>>> engine
>>> - * will then try to force lock the SFC (if currently locked, it
>>> will
>>> - * remain so until we tell the engine it is safe to unlock; if
>>> currently
>>> - * unlocked, it will ignore this and all new lock requests). If SFC
>>> - * ends up being locked to the engine we want to reset, we have
>>> to reset
>>> - * it as well (we will unlock it once the reset sequence is
>>> completed).
>>> + * If the engine is using a SFC, tell the engine that a software
>>> reset
>>> + * is going to happen. The engine will then try to force lock
>>> the SFC.
>>> + * If SFC ends up being locked to the engine we want to reset,
>>> we have
>>> + * to reset it as well (we will unlock it once the reset
>>> sequence is
>>> + * completed).
>>> */
>>> + if (!(intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit))
>>> + return 0;
>>> +
>>> rmw_set_fw(uncore, sfc_forced_lock, sfc_forced_lock_bit);
>>> if (__intel_wait_for_register_fw(uncore,
>>> @@ -366,10 +368,13 @@ static u32 gen11_lock_sfc(struct
>>> intel_engine_cs *engine)
>>> sfc_forced_lock_ack_bit,
>>> sfc_forced_lock_ack_bit,
>>> 1000, 0, NULL)) {
>>> - DRM_DEBUG_DRIVER("Wait for SFC forced lock ack failed\n");
>>> + /* did we race the unlock? */
>>
>> How do we race here? Are we not in complete control of the engine at
>> this point so the status of this engine using SFC or not should be
>> static, no?
>
> The hang detection might be due to a long non-preemptable batch, in
> which case there is in theory a chance for the batch to release the SFC
> while we try to lock it. The chance is incredibly small though, so am I
> being too paranoid?
I get it now, it is a legitimate race.
>
>>
>>> + if (intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit)
>>> + DRM_ERROR("Wait for SFC forced lock ack failed\n");
>>> return 0;
>>> }
>>> + /* The HW could return the ack even if the sfc is not in use */
>>
>> But the function checked whether SFC wasn't in use and bailed out
>> early - so is this comment relevant? (I understand it is true against
>> the specs just wondering about our exact code.)
>>
>
> Same rationale as the above, if the engine relased the SFC while we were
> locking it, the locking might have been rejected, but on ICL we still
> get the ack.
>
>>> if (intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit)
>>> return sfc_reset_bit;
>>> @@ -382,6 +387,7 @@ static void gen11_unlock_sfc(struct
>>> intel_engine_cs *engine)
>>> u8 vdbox_sfc_access =
>>> RUNTIME_INFO(engine->i915)->vdbox_sfc_access;
>>> i915_reg_t sfc_forced_lock;
>>> u32 sfc_forced_lock_bit;
>>> + u32 lock;
>>> switch (engine->class) {
>>> case VIDEO_DECODE_CLASS:
>>> @@ -401,7 +407,10 @@ static void gen11_unlock_sfc(struct
>>> intel_engine_cs *engine)
>>> return;
>>> }
>>> - rmw_clear_fw(uncore, sfc_forced_lock, sfc_forced_lock_bit);
>>> + lock = intel_uncore_read_fw(uncore, sfc_forced_lock);
>>> + if (lock & sfc_forced_lock_bit)
>>> + intel_uncore_write_fw(uncore, sfc_forced_lock,
>>> + lock & ~sfc_forced_lock_bit);
>>
>> Here we can't rely on the return code from gen11_lock_sfc and have to
>> read the register ourselves? I guess it depends on my question about
>> the race comment.
>>
>> In addition to this I now see that gen11_reset_engines does not use
>> the return value from gen11_lock_sfc when deciding which engines it
>> needs to unlock. Should we change that as well?
>
> Paranoia here as well, in case something went wrong with the locking I'd
> like to be sure the unlocking can still be performed independently so we
> can recover. e.g. the locking might have succeeded after we hit the
> timeout in gen11_lock_sfc , in which case the return from that function
> won't reflect the status of the HW.
Put in a comment here explaining what's the story and with that:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Another option is to cross-check software vs hardware locked status at
this point and to log a mismatch. Just so we get data how often this
happens in practice. This is probably best as follow up.
Regards,
Tvrtko
>
> Thanks,
> Daniele
>
>>
>>
>>> }
>>> static int gen11_reset_engines(struct intel_gt *gt,
>>>
>>
>> Regards,
>>
>> Tvrtko
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-09-18 13:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-16 21:41 [PATCH] drm/i915: fix SFC reset flow Daniele Ceraolo Spurio
2019-09-16 22:41 ` ✓ Fi.CI.BAT: success for " Patchwork
2019-09-17 5:37 ` ✓ Fi.CI.IGT: " Patchwork
2019-09-17 7:57 ` [PATCH] " Chris Wilson
2019-09-17 18:36 ` Daniele Ceraolo Spurio
2019-09-17 18:57 ` Chris Wilson
2019-09-17 19:45 ` Daniele Ceraolo Spurio
2019-09-17 19:49 ` Chris Wilson
2019-09-17 20:53 ` Daniele Ceraolo Spurio
2019-09-17 10:22 ` Tvrtko Ursulin
2019-09-17 18:29 ` Daniele Ceraolo Spurio
2019-09-18 13:42 ` Tvrtko Ursulin [this message]
2019-09-17 16:06 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3538e930-0ac3-af8d-74e5-a48036282a96@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=owen.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.