From: Jani Nikula <jani.nikula@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
Andi Shyti <andi.shyti@linux.intel.com>,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/gt: Retry RING_HEAD reset until it sticks
Date: Tue, 27 Dec 2022 18:33:16 +0200 [thread overview]
Message-ID: <87h6xg24dv.fsf@intel.com> (raw)
In-Reply-To: <d99c1147-fc3f-71f0-c115-49febbf2aa3a@linux.intel.com>
On Fri, 23 Dec 2022, Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> On 23/12/2022 12:18, Andi Shyti wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> On Haswell, in particular, we see an issue where resets fails because
>> the engine resumes from an incorrect RING_HEAD. Since the RING_HEAD
>> doesn't point to the remaining requests to re-run, but may instead point
>> into the uninitialised portion of the ring, the GPU may be then fed
>> invalid instructions from a privileged context, often pushing the GPU
>> into an unrecoverable hang.
>>
>> If at first the write doesn't succeed, try, try again.
>>
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/5432
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3303
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>> .../gpu/drm/i915/gt/intel_ring_submission.c | 44 +++++++++++++------
>> drivers/gpu/drm/i915/i915_utils.h | 8 ++++
>> 2 files changed, 38 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> index 827adb0cfaea6..cdf283f5b1427 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
>> @@ -192,6 +192,7 @@ static bool stop_ring(struct intel_engine_cs *engine)
>> static int xcs_resume(struct intel_engine_cs *engine)
>> {
>> struct intel_ring *ring = engine->legacy.ring;
>> + ktime_t kt;
>>
>> ENGINE_TRACE(engine, "ring:{HEAD:%04x, TAIL:%04x}\n",
>> ring->head, ring->tail);
>> @@ -230,9 +231,20 @@ static int xcs_resume(struct intel_engine_cs *engine)
>> set_pp_dir(engine);
>>
>> /* First wake the ring up to an empty/idle ring */
>> - ENGINE_WRITE_FW(engine, RING_HEAD, ring->head);
>> + until_timeout_ns(kt, 2 * NSEC_PER_MSEC) {
>> + ENGINE_WRITE_FW(engine, RING_HEAD, ring->head);
>> + if (ENGINE_READ_FW(engine, RING_HEAD) == ring->head)
>> + break;
>> + }
>
> 2ms?! Shudder..
>
> #define done \
> ({ \
> ENGINE_WRITE_FW(engine, RING_HEAD, ring->head); \
> ENGINE_READ_FW(engine, RING_HEAD) == ring->head; \
> })
> _wait_for_atomic(done, 2 * USEC_PER_MSEC, needs_to_be_atomic_or_not?);
> #undef done
>
> Should work and avoid the need to add yet another helper, please
> double-check. Not as pretty, but accumulating generic sounding helpers
> in i915_utils.h is a bit frowned upon.
Yeah, please no more helpers like this. They're not helping.
BR,
Jani.
>
> Regards,
>
> Tvrtko
>
>> +
>> ENGINE_WRITE_FW(engine, RING_TAIL, ring->head);
>> - ENGINE_POSTING_READ(engine, RING_TAIL);
>> + if (ENGINE_READ_FW(engine, RING_HEAD) != ENGINE_READ_FW(engine, RING_TAIL)) {
>> + ENGINE_TRACE(engine, "failed to reset empty ring: [%x, %x]: %x\n",
>> + ENGINE_READ_FW(engine, RING_HEAD),
>> + ENGINE_READ_FW(engine, RING_TAIL),
>> + ring->head);
>> + goto err;
>> + }
>>
>> ENGINE_WRITE_FW(engine, RING_CTL,
>> RING_CTL_SIZE(ring->size) | RING_VALID);
>> @@ -241,12 +253,16 @@ static int xcs_resume(struct intel_engine_cs *engine)
>> if (__intel_wait_for_register_fw(engine->uncore,
>> RING_CTL(engine->mmio_base),
>> RING_VALID, RING_VALID,
>> - 5000, 0, NULL))
>> + 5000, 0, NULL)) {
>> + ENGINE_TRACE(engine, "failed to restart\n");
>> goto err;
>> + }
>>
>> - if (GRAPHICS_VER(engine->i915) > 2)
>> + if (GRAPHICS_VER(engine->i915) > 2) {
>> ENGINE_WRITE_FW(engine,
>> RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING));
>> + ENGINE_POSTING_READ(engine, RING_MI_MODE);
>> + }
>>
>> /* Now awake, let it get started */
>> if (ring->tail != ring->head) {
>> @@ -259,16 +275,16 @@ static int xcs_resume(struct intel_engine_cs *engine)
>> return 0;
>>
>> err:
>> - drm_err(&engine->i915->drm,
>> - "%s initialization failed; "
>> - "ctl %08x (valid? %d) head %08x [%08x] tail %08x [%08x] start %08x [expected %08x]\n",
>> - engine->name,
>> - ENGINE_READ(engine, RING_CTL),
>> - ENGINE_READ(engine, RING_CTL) & RING_VALID,
>> - ENGINE_READ(engine, RING_HEAD), ring->head,
>> - ENGINE_READ(engine, RING_TAIL), ring->tail,
>> - ENGINE_READ(engine, RING_START),
>> - i915_ggtt_offset(ring->vma));
>> + ENGINE_TRACE(engine,
>> + "initialization failed; "
>> + "ctl %08x (valid? %d) head %08x [%08x] tail %08x [%08x] start %08x [expected %08x]\n",
>> + ENGINE_READ(engine, RING_CTL),
>> + ENGINE_READ(engine, RING_CTL) & RING_VALID,
>> + ENGINE_READ(engine, RING_HEAD), ring->head,
>> + ENGINE_READ(engine, RING_TAIL), ring->tail,
>> + ENGINE_READ(engine, RING_START),
>> + i915_ggtt_offset(ring->vma));
>> + GEM_TRACE_DUMP();
>> return -EIO;
>> }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
>> index b64192d9c7daa..f24a25c0685e1 100644
>> --- a/drivers/gpu/drm/i915/i915_utils.h
>> +++ b/drivers/gpu/drm/i915/i915_utils.h
>> @@ -254,6 +254,14 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
>> }
>> }
>>
>> +/**
>> + * until_timeout_ns - Keep retrying (busy spin) until the duration has passed
>> + */
>> +#define until_timeout_ns(end, timeout_ns) \
>> + for ((end) = ktime_get() + (timeout_ns); \
>> + ktime_before(ktime_get(), (end)); \
>> + cpu_relax())
>> + > /**
>> * __wait_for - magic wait macro
>> *
--
Jani Nikula, Intel Open Source Graphics Center
next prev parent reply other threads:[~2022-12-27 16:33 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-23 12:18 [Intel-gfx] [PATCH] drm/i915/gt: Retry RING_HEAD reset until it sticks Andi Shyti
2022-12-23 12:18 ` Andi Shyti
2022-12-23 13:15 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gt: Retry RING_HEAD reset until it sticks (rev2) Patchwork
2022-12-23 13:21 ` [Intel-gfx] [PATCH] drm/i915/gt: Retry RING_HEAD reset until it sticks Tvrtko Ursulin
2022-12-27 16:33 ` Jani Nikula [this message]
2022-12-23 14:30 ` [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/gt: Retry RING_HEAD reset until it sticks (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h6xg24dv.fsf@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=andi.shyti@linux.intel.com \
--cc=andrzej.hajda@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.