Re: [Intel-gfx] [PATCH] drm/i915/gt: Reset twice

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Andi Shyti <andi.shyti@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, stable@vger.kernel.org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/gt: Reset twice
Date: Mon, 12 Dec 2022 11:55:10 -0500	[thread overview]
Message-ID: <Y5dc7vhfh6yixFRo@intel.com> (raw)
In-Reply-To: <20221212161338.1007659-1-andi.shyti@linux.intel.com>

On Mon, Dec 12, 2022 at 05:13:38PM +0100, Andi Shyti wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> After applying an engine reset, on some platforms like Jasperlake, we
> occasionally detect that the engine state is not cleared until shortly
> after the resume. As we try to resume the engine with volatile internal
> state, the first request fails with a spurious CS event (it looks like
> it reports a lite-restore to the hung context, instead of the expected
> idle->active context switch).
> 
> Signed-off-by: Chris Wilson <hris@chris-wilson.co.uk>

There's a typo in the signature email I'm afraid...

Other than that, have we checked the possibility of using the driver-initiated-flr bit
instead of this second loop? That should be the right way to guarantee everything is
cleared on gen11+...

> Cc: stable@vger.kernel.org
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 34 ++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index ffde89c5835a4..88dfc0c5316ff 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -268,6 +268,7 @@ static int ilk_do_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask,
>  static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>  {
>  	struct intel_uncore *uncore = gt->uncore;
> +	int loops = 2;
>  	int err;
>  
>  	/*
> @@ -275,18 +276,39 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>  	 * for fifo space for the write or forcewake the chip for
>  	 * the read
>  	 */
> -	intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
> +	do {
> +		intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
>  
> -	/* Wait for the device to ack the reset requests */
> -	err = __intel_wait_for_register_fw(uncore,
> -					   GEN6_GDRST, hw_domain_mask, 0,
> -					   500, 0,
> -					   NULL);
> +		/*
> +		 * Wait for the device to ack the reset requests.
> +		 *
> +		 * On some platforms, e.g. Jasperlake, we see see that the
> +		 * engine register state is not cleared until shortly after
> +		 * GDRST reports completion, causing a failure as we try
> +		 * to immediately resume while the internal state is still
> +		 * in flux. If we immediately repeat the reset, the second
> +		 * reset appears to serialise with the first, and since
> +		 * it is a no-op, the registers should retain their reset
> +		 * value. However, there is still a concern that upon
> +		 * leaving the second reset, the internal engine state
> +		 * is still in flux and not ready for resuming.
> +		 */
> +		err = __intel_wait_for_register_fw(uncore, GEN6_GDRST,
> +						   hw_domain_mask, 0,
> +						   2000, 0,
> +						   NULL);
> +	} while (err == 0 && --loops);
>  	if (err)
>  		GT_TRACE(gt,
>  			 "Wait for 0x%08x engines reset failed\n",
>  			 hw_domain_mask);
>  
> +	/*
> +	 * As we have observed that the engine state is still volatile
> +	 * after GDRST is acked, impose a small delay to let everything settle.
> +	 */
> +	udelay(50);
> +
>  	return err;
>  }
>  
> -- 
> 2.38.1
>

WARNING: multiple messages have this Message-ID (diff)

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>, <stable@vger.kernel.org>,
	Mika Kuoppala <mika.kuoppala@linux.intel.com>,
	"Andi Shyti" <andi@etezian.org>,
	Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH] drm/i915/gt: Reset twice
Date: Mon, 12 Dec 2022 11:55:10 -0500	[thread overview]
Message-ID: <Y5dc7vhfh6yixFRo@intel.com> (raw)
In-Reply-To: <20221212161338.1007659-1-andi.shyti@linux.intel.com>

On Mon, Dec 12, 2022 at 05:13:38PM +0100, Andi Shyti wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> After applying an engine reset, on some platforms like Jasperlake, we
> occasionally detect that the engine state is not cleared until shortly
> after the resume. As we try to resume the engine with volatile internal
> state, the first request fails with a spurious CS event (it looks like
> it reports a lite-restore to the hung context, instead of the expected
> idle->active context switch).
> 
> Signed-off-by: Chris Wilson <hris@chris-wilson.co.uk>

There's a typo in the signature email I'm afraid...

Other than that, have we checked the possibility of using the driver-initiated-flr bit
instead of this second loop? That should be the right way to guarantee everything is
cleared on gen11+...

> Cc: stable@vger.kernel.org
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 34 ++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index ffde89c5835a4..88dfc0c5316ff 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -268,6 +268,7 @@ static int ilk_do_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask,
>  static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>  {
>  	struct intel_uncore *uncore = gt->uncore;
> +	int loops = 2;
>  	int err;
>  
>  	/*
> @@ -275,18 +276,39 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>  	 * for fifo space for the write or forcewake the chip for
>  	 * the read
>  	 */
> -	intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
> +	do {
> +		intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
>  
> -	/* Wait for the device to ack the reset requests */
> -	err = __intel_wait_for_register_fw(uncore,
> -					   GEN6_GDRST, hw_domain_mask, 0,
> -					   500, 0,
> -					   NULL);
> +		/*
> +		 * Wait for the device to ack the reset requests.
> +		 *
> +		 * On some platforms, e.g. Jasperlake, we see see that the
> +		 * engine register state is not cleared until shortly after
> +		 * GDRST reports completion, causing a failure as we try
> +		 * to immediately resume while the internal state is still
> +		 * in flux. If we immediately repeat the reset, the second
> +		 * reset appears to serialise with the first, and since
> +		 * it is a no-op, the registers should retain their reset
> +		 * value. However, there is still a concern that upon
> +		 * leaving the second reset, the internal engine state
> +		 * is still in flux and not ready for resuming.
> +		 */
> +		err = __intel_wait_for_register_fw(uncore, GEN6_GDRST,
> +						   hw_domain_mask, 0,
> +						   2000, 0,
> +						   NULL);
> +	} while (err == 0 && --loops);
>  	if (err)
>  		GT_TRACE(gt,
>  			 "Wait for 0x%08x engines reset failed\n",
>  			 hw_domain_mask);
>  
> +	/*
> +	 * As we have observed that the engine state is still volatile
> +	 * after GDRST is acked, impose a small delay to let everything settle.
> +	 */
> +	udelay(50);
> +
>  	return err;
>  }
>  
> -- 
> 2.38.1
>

WARNING: multiple messages have this Message-ID (diff)

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, stable@vger.kernel.org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	dri-devel@lists.freedesktop.org, Andi Shyti <andi@etezian.org>
Subject: Re: [PATCH] drm/i915/gt: Reset twice
Date: Mon, 12 Dec 2022 11:55:10 -0500	[thread overview]
Message-ID: <Y5dc7vhfh6yixFRo@intel.com> (raw)
In-Reply-To: <20221212161338.1007659-1-andi.shyti@linux.intel.com>

On Mon, Dec 12, 2022 at 05:13:38PM +0100, Andi Shyti wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> After applying an engine reset, on some platforms like Jasperlake, we
> occasionally detect that the engine state is not cleared until shortly
> after the resume. As we try to resume the engine with volatile internal
> state, the first request fails with a spurious CS event (it looks like
> it reports a lite-restore to the hung context, instead of the expected
> idle->active context switch).
> 
> Signed-off-by: Chris Wilson <hris@chris-wilson.co.uk>

There's a typo in the signature email I'm afraid...

Other than that, have we checked the possibility of using the driver-initiated-flr bit
instead of this second loop? That should be the right way to guarantee everything is
cleared on gen11+...

> Cc: stable@vger.kernel.org
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 34 ++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index ffde89c5835a4..88dfc0c5316ff 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -268,6 +268,7 @@ static int ilk_do_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask,
>  static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>  {
>  	struct intel_uncore *uncore = gt->uncore;
> +	int loops = 2;
>  	int err;
>  
>  	/*
> @@ -275,18 +276,39 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>  	 * for fifo space for the write or forcewake the chip for
>  	 * the read
>  	 */
> -	intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
> +	do {
> +		intel_uncore_write_fw(uncore, GEN6_GDRST, hw_domain_mask);
>  
> -	/* Wait for the device to ack the reset requests */
> -	err = __intel_wait_for_register_fw(uncore,
> -					   GEN6_GDRST, hw_domain_mask, 0,
> -					   500, 0,
> -					   NULL);
> +		/*
> +		 * Wait for the device to ack the reset requests.
> +		 *
> +		 * On some platforms, e.g. Jasperlake, we see see that the
> +		 * engine register state is not cleared until shortly after
> +		 * GDRST reports completion, causing a failure as we try
> +		 * to immediately resume while the internal state is still
> +		 * in flux. If we immediately repeat the reset, the second
> +		 * reset appears to serialise with the first, and since
> +		 * it is a no-op, the registers should retain their reset
> +		 * value. However, there is still a concern that upon
> +		 * leaving the second reset, the internal engine state
> +		 * is still in flux and not ready for resuming.
> +		 */
> +		err = __intel_wait_for_register_fw(uncore, GEN6_GDRST,
> +						   hw_domain_mask, 0,
> +						   2000, 0,
> +						   NULL);
> +	} while (err == 0 && --loops);
>  	if (err)
>  		GT_TRACE(gt,
>  			 "Wait for 0x%08x engines reset failed\n",
>  			 hw_domain_mask);
>  
> +	/*
> +	 * As we have observed that the engine state is still volatile
> +	 * after GDRST is acked, impose a small delay to let everything settle.
> +	 */
> +	udelay(50);
> +
>  	return err;
>  }
>  
> -- 
> 2.38.1
>

next prev parent reply	other threads:[~2022-12-12 16:55 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-12 16:13 [Intel-gfx] [PATCH] drm/i915/gt: Reset twice Andi Shyti
2022-12-12 16:13 ` Andi Shyti
2022-12-12 16:13 ` Andi Shyti
2022-12-12 16:55 ` Rodrigo Vivi [this message]
2022-12-12 16:55   ` Rodrigo Vivi
2022-12-12 16:55   ` Rodrigo Vivi
2022-12-12 23:08   ` [Intel-gfx] " Andi Shyti
2022-12-12 23:08     ` Andi Shyti
2022-12-12 23:08     ` Andi Shyti
2022-12-13 13:18     ` [Intel-gfx] " Vivi, Rodrigo
2022-12-13 13:18       ` Vivi, Rodrigo
2022-12-13 13:18       ` Vivi, Rodrigo
2022-12-14 22:37       ` [Intel-gfx] " Andi Shyti
2022-12-14 22:37         ` Andi Shyti
2022-12-14 22:37         ` Andi Shyti
2022-12-15 20:07         ` [Intel-gfx] " Rodrigo Vivi
2022-12-15 20:07           ` Rodrigo Vivi
2022-12-22  9:28           ` Gwan-gyeong Mun
2022-12-22  9:28             ` Gwan-gyeong Mun
2022-12-22 13:47             ` Andi Shyti
2022-12-22 13:47               ` Andi Shyti
2022-12-22 13:47               ` Andi Shyti
2022-12-23  6:24               ` Gwan-gyeong Mun
2022-12-23  6:24                 ` Gwan-gyeong Mun
2022-12-12 18:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-12-12 18:46 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-12-13 10:11 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5dc7vhfh6yixFRo@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.