* [PATCH v2 0/2] Reset GuC and retry on fw load failure
@ 2016-03-08 11:38 Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 1/2] drm/i915: Add function to reset an engine domain Arun Siluvery
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Arun Siluvery @ 2016-03-08 11:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Below changes add a mechanism to reset GuC and retry fw loading if the
initial load fails. There are cetain HW issues because of which fw load can
fail and the WA is to retry after resetting GuC.
A patch required for engine reset is partially reused as we use that
function to reset guc.
v2: used updated engine reset helper function.
Arun Siluvery (2):
drm/i915: Add function to reset an engine domain
drm/i915/guc: Reset GuC and retry on firmware load failure
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++--
drivers/gpu/drm/i915/intel_uncore.c | 34 +++++++++++++++++++++++
5 files changed, 84 insertions(+), 2 deletions(-)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] drm/i915: Add function to reset an engine domain
2016-03-08 11:38 [PATCH v2 0/2] Reset GuC and retry on fw load failure Arun Siluvery
@ 2016-03-08 11:38 ` Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure Arun Siluvery
2016-03-08 11:55 ` ✗ Fi.CI.BAT: warning for Reset GuC and retry on fw load failure (rev2) Patchwork
2 siblings, 0 replies; 6+ messages in thread
From: Arun Siluvery @ 2016-03-08 11:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Partial port of a patch from Mika that modifies reset function to handle
per engine resets. A domain reset function is introduces which accepts a
mask of all domains to be reset. In case of per engine reset only single
engine domain is specified where as for legacy full gpu reset all engine
domains are specified.
This change also supports to reset GuC which is required for some of the WA
where fw load can fail and we retry after resetting GuC.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
drivers/gpu/drm/i915/intel_uncore.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d31447f..80e38d5 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1510,6 +1510,21 @@ static int ironlake_do_reset(struct drm_device *dev)
return 0;
}
+static int gen6_domain_reset(struct drm_i915_private *dev_priv,
+ u32 hw_domain_mask)
+{
+ int ret;
+
+ __raw_i915_write32(dev_priv, GEN6_GDRST, hw_domain_mask);
+
+#define ACKED ((__raw_i915_read32(dev_priv, GEN6_GDRST) & hw_domain_mask) == 0)
+ /* Spin waiting for the device to ack the reset requests */
+ ret = wait_for_atomic_us(ACKED, 500);
+#undef ACKED
+
+ return ret;
+}
+
static int gen6_do_reset(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-08 11:38 [PATCH v2 0/2] Reset GuC and retry on fw load failure Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 1/2] drm/i915: Add function to reset an engine domain Arun Siluvery
@ 2016-03-08 11:38 ` Arun Siluvery
2016-03-10 17:57 ` Yu Dai
2016-03-08 11:55 ` ✗ Fi.CI.BAT: warning for Reset GuC and retry on fw load failure (rev2) Patchwork
2 siblings, 1 reply; 6+ messages in thread
From: Arun Siluvery @ 2016-03-08 11:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Due to timing issues in the HW some of the status bits required for GuC
authentication doesn't get set occassionally, when that happens, GuC cannot
be initialized and we will be left with a wedged GPU. The WA suggested is
to perform a soft reset of GuC and attempt to reload the fw again for few
times before giving up.
As the failure is dependent on timing, tests performed by triggering manual
full gpu reset (i915_wedged) showed that we could sometimes hit this after
several thousand iterations but sometimes tests ran even longer without any
issues. Reset and reload mechanism proved helpful when we indeed hit fw
load failure so it is better to include this to improve driver stability.
This change implements the following WA,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++--
drivers/gpu/drm/i915/intel_uncore.c | 19 +++++++++++++
5 files changed, 69 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f37ac12..0df7c82 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2757,6 +2757,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
extern int intel_gpu_reset(struct drm_device *dev);
extern bool intel_has_gpu_reset(struct drm_device *dev);
extern int i915_reset(struct drm_device *dev);
+extern int intel_guc_reset(struct drm_i915_private *dev_priv);
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
index e4ba582..94ceee5 100644
--- a/drivers/gpu/drm/i915/i915_guc_reg.h
+++ b/drivers/gpu/drm/i915/i915_guc_reg.h
@@ -27,6 +27,7 @@
/* Definitions of GuC H/W registers, bits, etc */
#define GUC_STATUS _MMIO(0xc000)
+#define GS_MIA_IN_RESET (1 << 0)
#define GS_BOOTROM_SHIFT 1
#define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
#define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7dfc400..48a23de 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -164,6 +164,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
#define GEN6_GRDOM_RENDER (1 << 1)
#define GEN6_GRDOM_MEDIA (1 << 2)
#define GEN6_GRDOM_BLT (1 << 3)
+#define GEN9_GRDOM_GUC (1 << 5)
#define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228)
#define RING_PP_DIR_BASE_READ(ring) _MMIO((ring)->mmio_base+0x518)
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index 82a3c03..f9cb814 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
return ret;
}
+static int i915_reset_guc(struct drm_i915_private *dev_priv)
+{
+ int ret;
+ u32 guc_status;
+
+ ret = intel_guc_reset(dev_priv);
+ if (ret) {
+ DRM_ERROR("GuC reset failed, ret = %d\n", ret);
+ return ret;
+ }
+
+ guc_status = I915_READ(GUC_STATUS);
+ WARN(!(guc_status & GS_MIA_IN_RESET),
+ "GuC status: 0x%x, MIA core expected to be in reset\n", guc_status);
+
+ return ret;
+}
+
/**
* intel_guc_ucode_load() - load GuC uCode into the device
* @dev: drm device
@@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev)
if (err)
goto fail;
+ /*
+ * WaEnableuKernelHeaderValidFix:skl,bxt
+ * For BXT, this is only upto B0 but below WA is required for later
+ * steppings also so this is extended as well.
+ */
+ /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
err = guc_ucode_xfer(dev_priv);
- if (err)
- goto fail;
+ if (err) {
+ int retries = 3;
+
+ DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
+
+ while (retries--) {
+ err = i915_reset_guc(dev_priv);
+ if (err)
+ break;
+
+ err = guc_ucode_xfer(dev_priv);
+ if (!err) {
+ DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
+ break;
+ }
+ DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
+ }
+
+ if (err) {
+ DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
+ goto fail;
+ }
+ }
guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 80e38d5..92be76a 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1647,6 +1647,25 @@ bool intel_has_gpu_reset(struct drm_device *dev)
return intel_get_gpu_reset(dev) != NULL;
}
+int intel_guc_reset(struct drm_i915_private *dev_priv)
+{
+ int ret;
+ unsigned long irqflags;
+
+ if (!i915.enable_guc_submission)
+ return -EINVAL;
+
+ intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+ spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+
+ ret = gen6_domain_reset(dev_priv, GEN9_GRDOM_GUC);
+
+ spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+ intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+
+ return ret;
+}
+
bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
{
return check_for_unclaimed_mmio(dev_priv);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* ✗ Fi.CI.BAT: warning for Reset GuC and retry on fw load failure (rev2)
2016-03-08 11:38 [PATCH v2 0/2] Reset GuC and retry on fw load failure Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 1/2] drm/i915: Add function to reset an engine domain Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure Arun Siluvery
@ 2016-03-08 11:55 ` Patchwork
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2016-03-08 11:55 UTC (permalink / raw)
To: arun.siluvery; +Cc: intel-gfx
== Series Details ==
Series: Reset GuC and retry on fw load failure (rev2)
URL : https://patchwork.freedesktop.org/series/3985/
State : warning
== Summary ==
Series 3985v2 Reset GuC and retry on fw load failure
http://patchwork.freedesktop.org/api/1.0/series/3985/revisions/2/mbox/
Test kms_flip:
Subgroup basic-flip-vs-dpms:
pass -> DMESG-WARN (hsw-brixbox)
pass -> DMESG-WARN (bdw-ultra)
pass -> DMESG-WARN (ilk-hp8440p) UNSTABLE
Subgroup basic-flip-vs-modeset:
pass -> DMESG-WARN (ilk-hp8440p) UNSTABLE
Subgroup basic-plain-flip:
pass -> DMESG-WARN (hsw-brixbox)
Test kms_pipe_crc_basic:
Subgroup nonblocking-crc-pipe-a:
dmesg-warn -> PASS (snb-x220t)
Subgroup read-crc-pipe-b:
dmesg-warn -> PASS (hsw-gt2)
Subgroup suspend-read-crc-pipe-c:
pass -> DMESG-WARN (bsw-nuc-2)
Test pm_rpm:
Subgroup basic-pci-d3-state:
dmesg-fail -> FAIL (snb-x220t)
pass -> DMESG-WARN (snb-dellxps)
Subgroup basic-rte:
pass -> DMESG-WARN (snb-x220t)
dmesg-warn -> PASS (snb-dellxps)
bdw-nuci7 total:183 pass:172 dwarn:0 dfail:0 fail:0 skip:11
bdw-ultra total:183 pass:164 dwarn:1 dfail:0 fail:0 skip:18
bsw-nuc-2 total:183 pass:148 dwarn:1 dfail:0 fail:0 skip:34
byt-nuc total:183 pass:152 dwarn:0 dfail:0 fail:0 skip:31
hsw-brixbox total:183 pass:162 dwarn:2 dfail:0 fail:0 skip:19
hsw-gt2 total:183 pass:169 dwarn:0 dfail:0 fail:0 skip:14
ilk-hp8440p total:183 pass:123 dwarn:2 dfail:0 fail:0 skip:58
ivb-t430s total:183 pass:162 dwarn:0 dfail:0 fail:0 skip:21
skl-i5k-2 total:183 pass:163 dwarn:0 dfail:0 fail:0 skip:20
skl-i7k-2 total:183 pass:163 dwarn:0 dfail:0 fail:0 skip:20
snb-dellxps total:183 pass:153 dwarn:1 dfail:0 fail:0 skip:29
snb-x220t total:183 pass:153 dwarn:1 dfail:0 fail:1 skip:28
Results at /archive/results/CI_IGT_test/Patchwork_1541/
c4b2696531e375d94d135fc5e5e7b3072d92f141 drm-intel-nightly: 2016y-03m-08d-09h-54m-48s UTC integration manifest
e97ff08db3702f73a9aa92a6c6a844a41af52272 drm/i915/guc: Reset GuC and retry on firmware load failure
a20da4698b60d9c818fb67a9bc62f751074a2c60 drm/i915: Add function to reset an engine domain
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-08 11:38 ` [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure Arun Siluvery
@ 2016-03-10 17:57 ` Yu Dai
2016-03-11 11:03 ` Arun Siluvery
0 siblings, 1 reply; 6+ messages in thread
From: Yu Dai @ 2016-03-10 17:57 UTC (permalink / raw)
To: Arun Siluvery, intel-gfx; +Cc: Mika Kuoppala
LGTM. Reviewed-by: Alex Dai <yu.dai@intel.com>
On 03/08/2016 03:38 AM, Arun Siluvery wrote:
> Due to timing issues in the HW some of the status bits required for GuC
> authentication doesn't get set occassionally, when that happens, GuC cannot
> be initialized and we will be left with a wedged GPU. The WA suggested is
> to perform a soft reset of GuC and attempt to reload the fw again for few
> times before giving up.
>
> As the failure is dependent on timing, tests performed by triggering manual
> full gpu reset (i915_wedged) showed that we could sometimes hit this after
> several thousand iterations but sometimes tests ran even longer without any
> issues. Reset and reload mechanism proved helpful when we indeed hit fw
> load failure so it is better to include this to improve driver stability.
>
> This change implements the following WA,
>
> WaEnableuKernelHeaderValidFix:skl,bxt
> WaEnableGuCBootHashCheckNotSet:skl,bxt
>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: Alex Dai <yu.dai@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 1 +
> drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
> drivers/gpu/drm/i915/i915_reg.h | 1 +
> drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++--
> drivers/gpu/drm/i915/intel_uncore.c | 19 +++++++++++++
> 5 files changed, 69 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f37ac12..0df7c82 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2757,6 +2757,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
> extern int intel_gpu_reset(struct drm_device *dev);
> extern bool intel_has_gpu_reset(struct drm_device *dev);
> extern int i915_reset(struct drm_device *dev);
> +extern int intel_guc_reset(struct drm_i915_private *dev_priv);
> extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
> extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
> extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
> index e4ba582..94ceee5 100644
> --- a/drivers/gpu/drm/i915/i915_guc_reg.h
> +++ b/drivers/gpu/drm/i915/i915_guc_reg.h
> @@ -27,6 +27,7 @@
> /* Definitions of GuC H/W registers, bits, etc */
>
> #define GUC_STATUS _MMIO(0xc000)
> +#define GS_MIA_IN_RESET (1 << 0)
> #define GS_BOOTROM_SHIFT 1
> #define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
> #define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 7dfc400..48a23de 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -164,6 +164,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
> #define GEN6_GRDOM_RENDER (1 << 1)
> #define GEN6_GRDOM_MEDIA (1 << 2)
> #define GEN6_GRDOM_BLT (1 << 3)
> +#define GEN9_GRDOM_GUC (1 << 5)
>
> #define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228)
> #define RING_PP_DIR_BASE_READ(ring) _MMIO((ring)->mmio_base+0x518)
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> index 82a3c03..f9cb814 100644
> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
> return ret;
> }
>
> +static int i915_reset_guc(struct drm_i915_private *dev_priv)
> +{
> + int ret;
> + u32 guc_status;
> +
> + ret = intel_guc_reset(dev_priv);
> + if (ret) {
> + DRM_ERROR("GuC reset failed, ret = %d\n", ret);
> + return ret;
> + }
> +
> + guc_status = I915_READ(GUC_STATUS);
> + WARN(!(guc_status & GS_MIA_IN_RESET),
> + "GuC status: 0x%x, MIA core expected to be in reset\n", guc_status);
> +
> + return ret;
> +}
> +
> /**
> * intel_guc_ucode_load() - load GuC uCode into the device
> * @dev: drm device
> @@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev)
> if (err)
> goto fail;
>
> + /*
> + * WaEnableuKernelHeaderValidFix:skl,bxt
> + * For BXT, this is only upto B0 but below WA is required for later
> + * steppings also so this is extended as well.
> + */
> + /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
> err = guc_ucode_xfer(dev_priv);
> - if (err)
> - goto fail;
> + if (err) {
> + int retries = 3;
> +
> + DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
> +
> + while (retries--) {
> + err = i915_reset_guc(dev_priv);
> + if (err)
> + break;
> +
> + err = guc_ucode_xfer(dev_priv);
> + if (!err) {
> + DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
> + break;
> + }
> + DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
> + }
> +
> + if (err) {
> + DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
> + goto fail;
> + }
> + }
>
> guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 80e38d5..92be76a 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1647,6 +1647,25 @@ bool intel_has_gpu_reset(struct drm_device *dev)
> return intel_get_gpu_reset(dev) != NULL;
> }
>
> +int intel_guc_reset(struct drm_i915_private *dev_priv)
> +{
> + int ret;
> + unsigned long irqflags;
> +
> + if (!i915.enable_guc_submission)
> + return -EINVAL;
> +
> + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> + spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
> +
> + ret = gen6_domain_reset(dev_priv, GEN9_GRDOM_GUC);
> +
> + spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
> + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +
> + return ret;
> +}
> +
> bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
> {
> return check_for_unclaimed_mmio(dev_priv);
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-10 17:57 ` Yu Dai
@ 2016-03-11 11:03 ` Arun Siluvery
0 siblings, 0 replies; 6+ messages in thread
From: Arun Siluvery @ 2016-03-11 11:03 UTC (permalink / raw)
To: Yu Dai, intel-gfx; +Cc: Mika Kuoppala
On 10/03/2016 17:57, Yu Dai wrote:
> LGTM. Reviewed-by: Alex Dai <yu.dai@intel.com>
>
thanks for the review.
This patch is dependent on patch1 which is being reworked. I will rebase
send this to the list again once all dependencies are resolved.
regards
Arun
> On 03/08/2016 03:38 AM, Arun Siluvery wrote:
>> Due to timing issues in the HW some of the status bits required for GuC
>> authentication doesn't get set occassionally, when that happens, GuC
>> cannot
>> be initialized and we will be left with a wedged GPU. The WA suggested is
>> to perform a soft reset of GuC and attempt to reload the fw again for few
>> times before giving up.
>>
>> As the failure is dependent on timing, tests performed by triggering
>> manual
>> full gpu reset (i915_wedged) showed that we could sometimes hit this
>> after
>> several thousand iterations but sometimes tests ran even longer
>> without any
>> issues. Reset and reload mechanism proved helpful when we indeed hit fw
>> load failure so it is better to include this to improve driver stability.
>>
>> This change implements the following WA,
>>
>> WaEnableuKernelHeaderValidFix:skl,bxt
>> WaEnableGuCBootHashCheckNotSet:skl,bxt
>>
>> Cc: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Alex Dai <yu.dai@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.h | 1 +
>> drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
>> drivers/gpu/drm/i915/i915_reg.h | 1 +
>> drivers/gpu/drm/i915/intel_guc_loader.c | 49
>> +++++++++++++++++++++++++++++++--
>> drivers/gpu/drm/i915/intel_uncore.c | 19 +++++++++++++
>> 5 files changed, 69 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index f37ac12..0df7c82 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2757,6 +2757,7 @@ extern long i915_compat_ioctl(struct file *filp,
>> unsigned int cmd,
>> extern int intel_gpu_reset(struct drm_device *dev);
>> extern bool intel_has_gpu_reset(struct drm_device *dev);
>> extern int i915_reset(struct drm_device *dev);
>> +extern int intel_guc_reset(struct drm_i915_private *dev_priv);
>> extern unsigned long i915_chipset_val(struct drm_i915_private
>> *dev_priv);
>> extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
>> extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
>> diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h
>> b/drivers/gpu/drm/i915/i915_guc_reg.h
>> index e4ba582..94ceee5 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_guc_reg.h
>> @@ -27,6 +27,7 @@
>> /* Definitions of GuC H/W registers, bits, etc */
>> #define GUC_STATUS _MMIO(0xc000)
>> +#define GS_MIA_IN_RESET (1 << 0)
>> #define GS_BOOTROM_SHIFT 1
>> #define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
>> #define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index 7dfc400..48a23de 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -164,6 +164,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t
>> reg)
>> #define GEN6_GRDOM_RENDER (1 << 1)
>> #define GEN6_GRDOM_MEDIA (1 << 2)
>> #define GEN6_GRDOM_BLT (1 << 3)
>> +#define GEN9_GRDOM_GUC (1 << 5)
>> #define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228)
>> #define RING_PP_DIR_BASE_READ(ring) _MMIO((ring)->mmio_base+0x518)
>> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c
>> b/drivers/gpu/drm/i915/intel_guc_loader.c
>> index 82a3c03..f9cb814 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
>> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
>> @@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private
>> *dev_priv)
>> return ret;
>> }
>> +static int i915_reset_guc(struct drm_i915_private *dev_priv)
>> +{
>> + int ret;
>> + u32 guc_status;
>> +
>> + ret = intel_guc_reset(dev_priv);
>> + if (ret) {
>> + DRM_ERROR("GuC reset failed, ret = %d\n", ret);
>> + return ret;
>> + }
>> +
>> + guc_status = I915_READ(GUC_STATUS);
>> + WARN(!(guc_status & GS_MIA_IN_RESET),
>> + "GuC status: 0x%x, MIA core expected to be in reset\n",
>> guc_status);
>> +
>> + return ret;
>> +}
>> +
>> /**
>> * intel_guc_ucode_load() - load GuC uCode into the device
>> * @dev: drm device
>> @@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev)
>> if (err)
>> goto fail;
>> + /*
>> + * WaEnableuKernelHeaderValidFix:skl,bxt
>> + * For BXT, this is only upto B0 but below WA is required for later
>> + * steppings also so this is extended as well.
>> + */
>> + /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
>> err = guc_ucode_xfer(dev_priv);
>> - if (err)
>> - goto fail;
>> + if (err) {
>> + int retries = 3;
>> +
>> + DRM_ERROR("GuC fw load failed, err=%d, attempting reset and
>> retry\n", err);
>> +
>> + while (retries--) {
>> + err = i915_reset_guc(dev_priv);
>> + if (err)
>> + break;
>> +
>> + err = guc_ucode_xfer(dev_priv);
>> + if (!err) {
>> + DRM_DEBUG_DRIVER("GuC fw reload succeeded after
>> reset\n");
>> + break;
>> + }
>> + DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n",
>> retries);
>> + }
>> +
>> + if (err) {
>> + DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
>> + goto fail;
>> + }
>> + }
>> guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c
>> b/drivers/gpu/drm/i915/intel_uncore.c
>> index 80e38d5..92be76a 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1647,6 +1647,25 @@ bool intel_has_gpu_reset(struct drm_device *dev)
>> return intel_get_gpu_reset(dev) != NULL;
>> }
>> +int intel_guc_reset(struct drm_i915_private *dev_priv)
>> +{
>> + int ret;
>> + unsigned long irqflags;
>> +
>> + if (!i915.enable_guc_submission)
>> + return -EINVAL;
>> +
>> + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>> + spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
>> +
>> + ret = gen6_domain_reset(dev_priv, GEN9_GRDOM_GUC);
>> +
>> + spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
>> + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>> +
>> + return ret;
>> +}
>> +
>> bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
>> {
>> return check_for_unclaimed_mmio(dev_priv);
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-11 11:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-08 11:38 [PATCH v2 0/2] Reset GuC and retry on fw load failure Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 1/2] drm/i915: Add function to reset an engine domain Arun Siluvery
2016-03-08 11:38 ` [PATCH v2 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure Arun Siluvery
2016-03-10 17:57 ` Yu Dai
2016-03-11 11:03 ` Arun Siluvery
2016-03-08 11:55 ` ✗ Fi.CI.BAT: warning for Reset GuC and retry on fw load failure (rev2) Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox