* [PATCH 0/7] Fixes and worarounds for GuC issues
@ 2016-03-21 10:16 Dave Gordon
2016-03-21 10:16 ` [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
` (8 more replies)
0 siblings, 9 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
Several issues have been found around the general area of resetting
and reloading the GuC. These include the failure of the DMA engine
to capture header data and/or flag the result of the data hashing
operation performed during transfer. Another area of concern relates to
the hibernate-and-resume cycle, where the firmware reload may fail or
the doorbell hardware may be left in an inconsistant state.
This set of patches provides solutions (or in some cases workarounds)
for the various issues identified in this area.
Arun Siluvery (1):
drm/i915/guc: Reset GuC and retry on firmware load failure
Dave Gordon (6):
drm/i915/guc: always reset GuC before loading firmware
drm/i915/guc: add doorbell map to debugfs/i915_guc_info
drm/i915/guc: move guc_ring_doorbell() nearer to callsite
drm/i915/guc: refactor doorbell management code
drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission
drm/i915/guc: disable GuC submission earlier during GuC (re)load
drivers/gpu/drm/i915/i915_debugfs.c | 7 +
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_guc_reg.h | 11 +-
drivers/gpu/drm/i915/i915_guc_submission.c | 264 ++++++++++++++++++-----------
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_guc_loader.c | 54 +++++-
drivers/gpu/drm/i915/intel_uncore.c | 19 +++
7 files changed, 252 insertions(+), 105 deletions(-)
Cc: Alex Dai <yu.dai@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-21 16:58 ` Arun Siluvery
2016-03-21 10:16 ` [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
` (7 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
From: Arun Siluvery <arun.siluvery@linux.intel.com>
Due to timing issues in the HW some of the status bits required for GuC
authentication doesn't get set occassionally, when that happens, GuC cannot
be initialized and we will be left with a wedged GPU. The WA suggested is
to perform a soft reset of GuC and attempt to reload the fw again for few
times before giving up.
As the failure is dependent on timing, tests performed by triggering manual
full gpu reset (i915_wedged) showed that we could sometimes hit this after
several thousand iterations but sometimes tests ran even longer without any
issues. Reset and reload mechanism proved helpful when we indeed hit fw
load failure so it is better to include this to improve driver stability.
This change implements the following WA,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++--
drivers/gpu/drm/i915/intel_uncore.c | 19 +++++++++++++
5 files changed, 69 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 00c41a4..5418850 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2711,6 +2711,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
extern int intel_gpu_reset(struct drm_device *dev, u32 engine_mask);
extern bool intel_has_gpu_reset(struct drm_device *dev);
extern int i915_reset(struct drm_device *dev);
+extern int intel_guc_reset(struct drm_i915_private *dev_priv);
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
index e4ba582..94ceee5 100644
--- a/drivers/gpu/drm/i915/i915_guc_reg.h
+++ b/drivers/gpu/drm/i915/i915_guc_reg.h
@@ -27,6 +27,7 @@
/* Definitions of GuC H/W registers, bits, etc */
#define GUC_STATUS _MMIO(0xc000)
+#define GS_MIA_IN_RESET (1 << 0)
#define GS_BOOTROM_SHIFT 1
#define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
#define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 07e0449..cc71ca2 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -165,6 +165,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
#define GEN6_GRDOM_MEDIA (1 << 2)
#define GEN6_GRDOM_BLT (1 << 3)
#define GEN6_GRDOM_VECS (1 << 4)
+#define GEN9_GRDOM_GUC (1 << 5)
#define GEN8_GRDOM_MEDIA2 (1 << 7)
#define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228)
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index e1aff62..a07c228 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
return ret;
}
+static int i915_reset_guc(struct drm_i915_private *dev_priv)
+{
+ int ret;
+ u32 guc_status;
+
+ ret = intel_guc_reset(dev_priv);
+ if (ret) {
+ DRM_ERROR("GuC reset failed, ret = %d\n", ret);
+ return ret;
+ }
+
+ guc_status = I915_READ(GUC_STATUS);
+ WARN(!(guc_status & GS_MIA_IN_RESET),
+ "GuC status: 0x%x, MIA core expected to be in reset\n", guc_status);
+
+ return ret;
+}
+
/**
* intel_guc_ucode_load() - load GuC uCode into the device
* @dev: drm device
@@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev)
if (err)
goto fail;
+ /*
+ * WaEnableuKernelHeaderValidFix:skl,bxt
+ * For BXT, this is only upto B0 but below WA is required for later
+ * steppings also so this is extended as well.
+ */
+ /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
err = guc_ucode_xfer(dev_priv);
- if (err)
- goto fail;
+ if (err) {
+ int retries = 3;
+
+ DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
+
+ while (retries--) {
+ err = i915_reset_guc(dev_priv);
+ if (err)
+ break;
+
+ err = guc_ucode_xfer(dev_priv);
+ if (!err) {
+ DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
+ break;
+ }
+ DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
+ }
+
+ if (err) {
+ DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
+ goto fail;
+ }
+ }
guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 512b7fa..d44e07e 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1673,6 +1673,25 @@ bool intel_has_gpu_reset(struct drm_device *dev)
return intel_get_gpu_reset(dev) != NULL;
}
+int intel_guc_reset(struct drm_i915_private *dev_priv)
+{
+ int ret;
+ unsigned long irqflags;
+
+ if (!i915.enable_guc_submission)
+ return -EINVAL;
+
+ intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+ spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+
+ ret = gen6_hw_domain_reset(dev_priv, GEN9_GRDOM_GUC);
+
+ spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+ intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+
+ return ret;
+}
+
bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
{
return check_for_unclaimed_mmio(dev_priv);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
2016-03-21 10:16 ` [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-22 13:32 ` Arun Siluvery
2016-03-21 10:16 ` [PATCH 3/7] drm/i915/guc: add doorbell map to debugfs/i915_guc_info Dave Gordon
` (6 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
After a suspend-resume cycle, the resumed kernel has no idea what the
booted kernel may have done to the GuC before replacing itself with the
resumed image. In particular, it may have already loaded the GuC with
firmware, which will then cause this kernel's attempt to (re)load the
firmware to fail (GuC program memory is write-once!).
So here we choose to always reset the GuC just before (re)loading the
firmware, so the hardware should then be in a well-known state, and we
may even avoid some of the issues arising from unpredictable timing.
Also added some more fields & values to the definition of the GUC_STATUS
register, which is the key diagnostic indicator if the GuC load fails.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_guc_reg.h | 12 ++++++++--
drivers/gpu/drm/i915/intel_guc_loader.c | 40 ++++++++++++++++-----------------
2 files changed, 29 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
index 94ceee5..80786d9 100644
--- a/drivers/gpu/drm/i915/i915_guc_reg.h
+++ b/drivers/gpu/drm/i915/i915_guc_reg.h
@@ -27,10 +27,12 @@
/* Definitions of GuC H/W registers, bits, etc */
#define GUC_STATUS _MMIO(0xc000)
-#define GS_MIA_IN_RESET (1 << 0)
+#define GS_RESET_SHIFT 0
+#define GS_MIA_IN_RESET (0x01 << GS_RESET_SHIFT)
#define GS_BOOTROM_SHIFT 1
#define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
#define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
+#define GS_BOOTROM_JUMP_PASSED (0x76 << GS_BOOTROM_SHIFT)
#define GS_UKERNEL_SHIFT 8
#define GS_UKERNEL_MASK (0xFF << GS_UKERNEL_SHIFT)
#define GS_UKERNEL_LAPIC_DONE (0x30 << GS_UKERNEL_SHIFT)
@@ -38,7 +40,13 @@
#define GS_UKERNEL_READY (0xF0 << GS_UKERNEL_SHIFT)
#define GS_MIA_SHIFT 16
#define GS_MIA_MASK (0x07 << GS_MIA_SHIFT)
-#define GS_MIA_CORE_STATE (1 << GS_MIA_SHIFT)
+#define GS_MIA_CORE_STATE (0x01 << GS_MIA_SHIFT)
+#define GS_MIA_HALT_REQUESTED (0x02 << GS_MIA_SHIFT)
+#define GS_MIA_ISR_ENTRY (0x04 << GS_MIA_SHIFT)
+#define GS_AUTH_STATUS_SHIFT 30
+#define GS_AUTH_STATUS_MASK (0x03 << GS_AUTH_STATUS_SHIFT)
+#define GS_AUTH_STATUS_BAD (0x01 << GS_AUTH_STATUS_SHIFT)
+#define GS_AUTH_STATUS_GOOD (0x02 << GS_AUTH_STATUS_SHIFT)
#define SOFT_SCRATCH(n) _MMIO(0xc180 + (n) * 4)
#define SOFT_SCRATCH_COUNT 16
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index a07c228..a875936 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -387,7 +387,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
- int err = 0;
+ int retries, err = 0;
if (!i915.enable_guc_submission)
return 0;
@@ -441,29 +441,26 @@ int intel_guc_ucode_load(struct drm_device *dev)
* steppings also so this is extended as well.
*/
/* WaEnableGuCBootHashCheckNotSet:skl,bxt */
- err = guc_ucode_xfer(dev_priv);
- if (err) {
- int retries = 3;
-
- DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
-
- while (retries--) {
- err = i915_reset_guc(dev_priv);
- if (err)
- break;
-
- err = guc_ucode_xfer(dev_priv);
- if (!err) {
- DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
- break;
- }
- DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
- }
-
+ for (retries = 3; ; ) {
+ /*
+ * Always reset the GuC just before (re)loading, so
+ * that the state and timing are fairly predictable
+ */
+ err = i915_reset_guc(dev_priv);
if (err) {
- DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
+ DRM_ERROR("GuC reset failed, err %d\n", err);
goto fail;
}
+
+ err = guc_ucode_xfer(dev_priv);
+ if (!err)
+ break;
+
+ if (--retries == 0)
+ goto fail;
+
+ DRM_INFO("GuC fw load failed, err %d; will reset and "
+ "retry %d more time(s)\n", err, retries);
}
guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
@@ -485,6 +482,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
return 0;
fail:
+ DRM_ERROR("GuC firmware load failed, err %d\n", err);
if (guc_fw->guc_fw_load_status == GUC_FIRMWARE_PENDING)
guc_fw->guc_fw_load_status = GUC_FIRMWARE_FAIL;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/7] drm/i915/guc: add doorbell map to debugfs/i915_guc_info
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
2016-03-21 10:16 ` [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
2016-03-21 10:16 ` [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-21 10:16 ` [PATCH 4/7] drm/i915/guc: move guc_ring_doorbell() nearer to callsite Dave Gordon
` (5 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
To properly verify the driver->doorbell->GuC functionality, validation
needs to know how the driver has assigned the doorbell cache lines and
registers, so make them visible through debugfs.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ccdca2c..6678c15 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2499,6 +2499,13 @@ static int i915_guc_info(struct seq_file *m, void *data)
mutex_unlock(&dev->struct_mutex);
+ seq_printf(m, "Doorbell map:\n");
+ for (i = 0; i < BITS_TO_LONGS(GUC_MAX_DOORBELLS) - 3; i += 4)
+ seq_printf(m, "\t%016lx %016lx %016lx %016lx\n",
+ guc.doorbell_bitmap[i], guc.doorbell_bitmap[i+1],
+ guc.doorbell_bitmap[i+2], guc.doorbell_bitmap[i+3]);
+ seq_printf(m, "Doorbell next cacheline: 0x%x\n\n", guc.db_cacheline);
+
seq_printf(m, "GuC total action count: %llu\n", guc.action_count);
seq_printf(m, "GuC action failure count: %u\n", guc.action_fail);
seq_printf(m, "GuC last action command: 0x%x\n", guc.action_cmd);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/7] drm/i915/guc: move guc_ring_doorbell() nearer to callsite
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
` (2 preceding siblings ...)
2016-03-21 10:16 ` [PATCH 3/7] drm/i915/guc: add doorbell map to debugfs/i915_guc_info Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-21 10:16 ` [PATCH 5/7] drm/i915/guc: refactor doorbell management code Dave Gordon
` (4 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
Just code movement, no actual change to the function. This is in
preparation for the next patch, which will reorganise all the other
doorbell code, but doesn't change this function. So let's shuffle it
down near its caller rather than leaving it mixed in with the setup
code. Unlike the doorbell management code, this function is somewhat
time-critical, so putting it near its caller may even yield a tiny
performance improvement.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_guc_submission.c | 128 +++++++++++++++--------------
1 file changed, 67 insertions(+), 61 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index ae1f58d..2344baf 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -190,67 +190,6 @@ static void guc_init_doorbell(struct intel_guc *guc,
kunmap_atomic(base);
}
-static int guc_ring_doorbell(struct i915_guc_client *gc)
-{
- struct guc_process_desc *desc;
- union guc_doorbell_qw db_cmp, db_exc, db_ret;
- union guc_doorbell_qw *db;
- void *base;
- int attempt = 2, ret = -EAGAIN;
-
- base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
- desc = base + gc->proc_desc_offset;
-
- /* Update the tail so it is visible to GuC */
- desc->tail = gc->wq_tail;
-
- /* current cookie */
- db_cmp.db_status = GUC_DOORBELL_ENABLED;
- db_cmp.cookie = gc->cookie;
-
- /* cookie to be updated */
- db_exc.db_status = GUC_DOORBELL_ENABLED;
- db_exc.cookie = gc->cookie + 1;
- if (db_exc.cookie == 0)
- db_exc.cookie = 1;
-
- /* pointer of current doorbell cacheline */
- db = base + gc->doorbell_offset;
-
- while (attempt--) {
- /* lets ring the doorbell */
- db_ret.value_qw = atomic64_cmpxchg((atomic64_t *)db,
- db_cmp.value_qw, db_exc.value_qw);
-
- /* if the exchange was successfully executed */
- if (db_ret.value_qw == db_cmp.value_qw) {
- /* db was successfully rung */
- gc->cookie = db_exc.cookie;
- ret = 0;
- break;
- }
-
- /* XXX: doorbell was lost and need to acquire it again */
- if (db_ret.db_status == GUC_DOORBELL_DISABLED)
- break;
-
- DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
- db_cmp.cookie, db_ret.cookie);
-
- /* update the cookie to newly read cookie from GuC */
- db_cmp.cookie = db_ret.cookie;
- db_exc.cookie = db_ret.cookie + 1;
- if (db_exc.cookie == 0)
- db_exc.cookie = 1;
- }
-
- /* Finally, update the cached copy of the GuC's WQ head */
- gc->wq_head = desc->head;
-
- kunmap_atomic(base);
- return ret;
-}
-
static void guc_disable_doorbell(struct intel_guc *guc,
struct i915_guc_client *client)
{
@@ -471,6 +410,12 @@ static void guc_fini_ctx_desc(struct intel_guc *guc,
sizeof(desc) * client->ctx_index);
}
+/*
+ * Everything above here is concerned with setup & teardown, and is
+ * therefore not part of the somewhat time-critical batch-submission
+ * path of i915_guc_submit() below.
+ */
+
int i915_guc_wq_check_space(struct i915_guc_client *gc)
{
struct guc_process_desc *desc;
@@ -559,6 +504,67 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
return 0;
}
+static int guc_ring_doorbell(struct i915_guc_client *gc)
+{
+ struct guc_process_desc *desc;
+ union guc_doorbell_qw db_cmp, db_exc, db_ret;
+ union guc_doorbell_qw *db;
+ void *base;
+ int attempt = 2, ret = -EAGAIN;
+
+ base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
+ desc = base + gc->proc_desc_offset;
+
+ /* Update the tail so it is visible to GuC */
+ desc->tail = gc->wq_tail;
+
+ /* current cookie */
+ db_cmp.db_status = GUC_DOORBELL_ENABLED;
+ db_cmp.cookie = gc->cookie;
+
+ /* cookie to be updated */
+ db_exc.db_status = GUC_DOORBELL_ENABLED;
+ db_exc.cookie = gc->cookie + 1;
+ if (db_exc.cookie == 0)
+ db_exc.cookie = 1;
+
+ /* pointer of current doorbell cacheline */
+ db = base + gc->doorbell_offset;
+
+ while (attempt--) {
+ /* lets ring the doorbell */
+ db_ret.value_qw = atomic64_cmpxchg((atomic64_t *)db,
+ db_cmp.value_qw, db_exc.value_qw);
+
+ /* if the exchange was successfully executed */
+ if (db_ret.value_qw == db_cmp.value_qw) {
+ /* db was successfully rung */
+ gc->cookie = db_exc.cookie;
+ ret = 0;
+ break;
+ }
+
+ /* XXX: doorbell was lost and need to acquire it again */
+ if (db_ret.db_status == GUC_DOORBELL_DISABLED)
+ break;
+
+ DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
+ db_cmp.cookie, db_ret.cookie);
+
+ /* update the cookie to newly read cookie from GuC */
+ db_cmp.cookie = db_ret.cookie;
+ db_exc.cookie = db_ret.cookie + 1;
+ if (db_exc.cookie == 0)
+ db_exc.cookie = 1;
+ }
+
+ /* Finally, update the cached copy of the GuC's WQ head */
+ gc->wq_head = desc->head;
+
+ kunmap_atomic(base);
+ return ret;
+}
+
/**
* i915_guc_submit() - Submit commands through GuC
* @client: the guc client where commands will go through
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/7] drm/i915/guc: refactor doorbell management code
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
` (3 preceding siblings ...)
2016-03-21 10:16 ` [PATCH 4/7] drm/i915/guc: move guc_ring_doorbell() nearer to callsite Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-21 10:16 ` [PATCH 6/7] drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission Dave Gordon
` (3 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
During a hibernate/resume cycle, the driver, the GuC, and the doorbell
hardware can end up in inconsistent states. This patch refactors the
driver's handling and tracking of doorbells, in preparation for a later
one which will resolve the issue.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_guc_submission.c | 88 ++++++++++++++++++------------
1 file changed, 53 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 2344baf..6b30207 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -175,8 +175,48 @@ static int host2guc_sample_forcewake(struct intel_guc *guc,
* client object which contains the page being used for the doorbell
*/
+static int guc_update_doorbell_id(struct i915_guc_client *client,
+ struct guc_doorbell_info *doorbell,
+ u16 new_id)
+{
+ struct sg_table *sg = client->guc->ctx_pool_obj->pages;
+ void *doorbell_bitmap = client->guc->doorbell_bitmap;
+ struct guc_context_desc desc;
+ size_t len;
+
+ if (client->doorbell_id != GUC_INVALID_DOORBELL_ID &&
+ test_bit(client->doorbell_id, doorbell_bitmap)) {
+ /* Deactivate the old doorbell */
+ doorbell->db_status = GUC_DOORBELL_DISABLED;
+ (void)host2guc_release_doorbell(client->guc, client);
+ clear_bit(client->doorbell_id, doorbell_bitmap);
+ }
+
+ /* Update the GuC's idea of the doorbell ID */
+ len = sg_pcopy_to_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
+ sizeof(desc) * client->ctx_index);
+ if (len != sizeof(desc))
+ return -EFAULT;
+ desc.db_id = new_id;
+ len = sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
+ sizeof(desc) * client->ctx_index);
+ if (len != sizeof(desc))
+ return -EFAULT;
+
+ client->doorbell_id = new_id;
+ if (new_id == GUC_INVALID_DOORBELL_ID)
+ return 0;
+
+ /* Activate the new doorbell */
+ set_bit(client->doorbell_id, doorbell_bitmap);
+ doorbell->db_status = GUC_DOORBELL_ENABLED;
+ doorbell->cookie = 0;
+ return host2guc_allocate_doorbell(client->guc, client);
+}
+
static void guc_init_doorbell(struct intel_guc *guc,
- struct i915_guc_client *client)
+ struct i915_guc_client *client,
+ uint16_t db_id)
{
struct guc_doorbell_info *doorbell;
void *base;
@@ -184,8 +224,7 @@ static void guc_init_doorbell(struct intel_guc *guc,
base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
doorbell = base + client->doorbell_offset;
- doorbell->db_status = 1;
- doorbell->cookie = 0;
+ guc_update_doorbell_id(client, doorbell, db_id);
kunmap_atomic(base);
}
@@ -193,27 +232,16 @@ static void guc_init_doorbell(struct intel_guc *guc,
static void guc_disable_doorbell(struct intel_guc *guc,
struct i915_guc_client *client)
{
- struct drm_i915_private *dev_priv = guc_to_i915(guc);
struct guc_doorbell_info *doorbell;
void *base;
- i915_reg_t drbreg = GEN8_DRBREGL(client->doorbell_id);
- int value;
base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
doorbell = base + client->doorbell_offset;
- doorbell->db_status = 0;
+ guc_update_doorbell_id(client, doorbell, GUC_INVALID_DOORBELL_ID);
kunmap_atomic(base);
- I915_WRITE(drbreg, I915_READ(drbreg) & ~GEN8_DRB_VALID);
-
- value = I915_READ(drbreg);
- WARN_ON((value & GEN8_DRB_VALID) != 0);
-
- I915_WRITE(GEN8_DRBREGU(client->doorbell_id), 0);
- I915_WRITE(drbreg, 0);
-
/* XXX: wait for any interrupts */
/* XXX: wait for workqueue to drain */
}
@@ -260,7 +288,7 @@ static uint16_t assign_doorbell(struct intel_guc *guc, uint32_t priority)
if (id == end)
id = GUC_INVALID_DOORBELL_ID;
else
- bitmap_set(guc->doorbell_bitmap, id, 1);
+ set_bit(id, guc->doorbell_bitmap);
DRM_DEBUG_DRIVER("assigned %s priority doorbell id 0x%x\n",
hi_pri ? "high" : "normal", id);
@@ -268,11 +296,6 @@ static uint16_t assign_doorbell(struct intel_guc *guc, uint32_t priority)
return id;
}
-static void release_doorbell(struct intel_guc *guc, uint16_t id)
-{
- bitmap_clear(guc->doorbell_bitmap, id, 1);
-}
-
/*
* Initialise the process descriptor shared with the GuC firmware.
*/
@@ -667,15 +690,7 @@ static void guc_client_free(struct drm_device *dev,
if (!client)
return;
- if (client->doorbell_id != GUC_INVALID_DOORBELL_ID) {
- /*
- * First disable the doorbell, then tell the GuC we've
- * finished with it, finally deallocate it in our bitmap
- */
- guc_disable_doorbell(guc, client);
- host2guc_release_doorbell(guc, client);
- release_doorbell(guc, client->doorbell_id);
- }
+ guc_disable_doorbell(guc, client);
/*
* XXX: wait for any outstanding submissions before freeing memory.
@@ -712,6 +727,7 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_guc *guc = &dev_priv->guc;
struct drm_i915_gem_object *obj;
+ uint16_t db_id;
client = kzalloc(sizeof(*client), GFP_KERNEL);
if (!client)
@@ -750,22 +766,24 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
else
client->proc_desc_offset = (GUC_DB_SIZE / 2);
- client->doorbell_id = assign_doorbell(guc, client->priority);
- if (client->doorbell_id == GUC_INVALID_DOORBELL_ID)
+ db_id = assign_doorbell(guc, client->priority);
+ if (db_id == GUC_INVALID_DOORBELL_ID)
/* XXX: evict a doorbell instead */
goto err;
guc_init_proc_desc(guc, client);
guc_init_ctx_desc(guc, client);
- guc_init_doorbell(guc, client);
+ guc_init_doorbell(guc, client, db_id);
/* XXX: Any cache flushes needed? General domain mgmt calls? */
if (host2guc_allocate_doorbell(guc, client))
goto err;
- DRM_DEBUG_DRIVER("new priority %u client %p: ctx_index %u db_id %u\n",
- priority, client, client->ctx_index, client->doorbell_id);
+ DRM_DEBUG_DRIVER("new priority %u client %p: ctx_index %u\n",
+ priority, client, client->ctx_index);
+ DRM_DEBUG_DRIVER("doorbell id %u, cacheline offset 0x%x\n",
+ client->doorbell_id, client->doorbell_offset);
return client;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/7] drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
` (4 preceding siblings ...)
2016-03-21 10:16 ` [PATCH 5/7] drm/i915/guc: refactor doorbell management code Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-21 10:16 ` [PATCH 7/7] drm/i915/guc: disable GuC submission earlier during GuC (re)load Dave Gordon
` (2 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
During a hibernate/resume cycle, the whole system is reset, including
the GuC and the doorbell hardware. Then the system is booted up, drivers
are loaded, etc -- the GuC firmware may be loaded and set running at this
point. But then, the booted kernel is replaced by the hibernated image,
and this resumed kernel will also try to reload the GuC firmware (which
will fail). To recover, we reset the GuC and try again (which should
work). But this GuC reset doesn't also reset the doorbell hardware, so
it can be left in a state inconsistent with that assumed by the driver
and the GuC.
It would be better if the GuC reset also cleared all doorbell state,
but that's not how the hardware currently works; also, the driver cannot
directly reprogram the doorbell hardware (only the GuC can do that).
So this patch cycles through all doorbells, assigning and releasing each
in turn, so that all the doorbell hardware is left in a consistent state,
no matter how it was programmed by the previously-running kernel and/or
GuC firmware.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_guc_submission.c | 46 +++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 6b30207..ea8b8cb 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -707,6 +707,50 @@ static void guc_client_free(struct drm_device *dev,
kfree(client);
}
+/*
+ * Borrow the first client to set up & tear down every doorbell
+ * in turn, to ensure that all doorbell h/w is (re)initialised.
+ */
+static void guc_init_doorbell_hw(struct intel_guc *guc)
+{
+ struct drm_i915_private *dev_priv = guc_to_i915(guc);
+ struct i915_guc_client *client = guc->execbuf_client;
+ struct guc_doorbell_info *doorbell;
+ uint16_t db_id, i;
+ void *base;
+ int ret;
+
+ base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+ doorbell = base + client->doorbell_offset;
+ db_id = client->doorbell_id;
+
+ for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
+ i915_reg_t drbreg = GEN8_DRBREGL(i);
+ u32 value = I915_READ(drbreg);
+
+ ret = guc_update_doorbell_id(client, doorbell, i);
+
+ if (((value & GUC_DOORBELL_ENABLED) && (i != db_id)) || ret)
+ DRM_DEBUG_DRIVER("Doorbell reg 0x%x was 0x%x, ret %d\n",
+ drbreg.reg, value, ret);
+ }
+
+ /* Restore to original value */
+ guc_update_doorbell_id(client, doorbell, db_id);
+
+ for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
+ i915_reg_t drbreg = GEN8_DRBREGL(i);
+ u32 value = I915_READ(drbreg);
+
+ if ((value & GUC_DOORBELL_ENABLED) && (i != db_id))
+ DRM_DEBUG_DRIVER("Doorbell reg 0x%x finally 0x%x\n",
+ drbreg.reg, value);
+
+ }
+
+ kunmap_atomic(base);
+}
+
/**
* guc_client_alloc() - Allocate an i915_guc_client
* @dev: drm device
@@ -971,8 +1015,8 @@ int i915_guc_submission_enable(struct drm_device *dev)
}
guc->execbuf_client = client;
-
host2guc_sample_forcewake(guc, client);
+ guc_init_doorbell_hw(guc);
return 0;
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/7] drm/i915/guc: disable GuC submission earlier during GuC (re)load
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
` (5 preceding siblings ...)
2016-03-21 10:16 ` [PATCH 6/7] drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission Dave Gordon
@ 2016-03-21 10:16 ` Dave Gordon
2016-03-21 16:23 ` ✗ Fi.CI.BAT: warning for Fixes and worarounds for GuC issues Patchwork
2016-03-24 8:13 ` ✗ Fi.CI.BAT: failure for Fixes and worarounds for GuC issues (rev3) Patchwork
8 siblings, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-21 10:16 UTC (permalink / raw)
To: intel-gfx
When resetting and reloading the GuC, the GuC submission management code
also needs to destroy and recreate the GuC client(s). Currently this is
done by a separate call from the GuC loader, but really, it's just an
internal detail of the submission code. So here we remove the call from
the loader (which is too late, really, because the GuC has already been
reloaded at this point) and put it into guc_submission_init() instead.
This means that any preexisting client is destroyed *before* the GuC
(re)load and then recreated after, iff the firmware was successfully
loaded. If the GuC reload fails, we don't recreate the client, so fallback
to execlists mode (if active) won't leak the client object (previously,
the now-unusable client would have been left allocated, and leaked if
the driver were unloaded).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_guc_submission.c | 6 ++++--
drivers/gpu/drm/i915/intel_guc_loader.c | 3 ---
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index ea8b8cb..28f8c210 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -981,6 +981,10 @@ int i915_guc_submission_init(struct drm_device *dev)
const size_t gemsize = round_up(poolsize, PAGE_SIZE);
struct intel_guc *guc = &dev_priv->guc;
+ /* Wipe bitmap & delete client in case of reinitialisation */
+ bitmap_clear(guc->doorbell_bitmap, 0, GUC_MAX_DOORBELLS);
+ i915_guc_submission_disable(dev);
+
if (!i915.enable_guc_submission)
return 0; /* not enabled */
@@ -992,9 +996,7 @@ int i915_guc_submission_init(struct drm_device *dev)
return -ENOMEM;
ida_init(&guc->ctx_ids);
-
guc_create_log(guc);
-
guc_create_ads(guc);
return 0;
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index a875936..47af779 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -470,9 +470,6 @@ int intel_guc_ucode_load(struct drm_device *dev)
intel_guc_fw_status_repr(guc_fw->guc_fw_load_status));
if (i915.enable_guc_submission) {
- /* The execbuf_client will be recreated. Release it first. */
- i915_guc_submission_disable(dev);
-
err = i915_guc_submission_enable(dev);
if (err)
goto fail;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* ✗ Fi.CI.BAT: warning for Fixes and worarounds for GuC issues
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
` (6 preceding siblings ...)
2016-03-21 10:16 ` [PATCH 7/7] drm/i915/guc: disable GuC submission earlier during GuC (re)load Dave Gordon
@ 2016-03-21 16:23 ` Patchwork
2016-03-24 8:13 ` ✗ Fi.CI.BAT: failure for Fixes and worarounds for GuC issues (rev3) Patchwork
8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2016-03-21 16:23 UTC (permalink / raw)
To: Dave Gordon; +Cc: intel-gfx
== Series Details ==
Series: Fixes and worarounds for GuC issues
URL : https://patchwork.freedesktop.org/series/4694/
State : warning
== Summary ==
Series 4694v1 Fixes and worarounds for GuC issues
http://patchwork.freedesktop.org/api/1.0/series/4694/revisions/1/mbox/
Test kms_flip:
Subgroup basic-flip-vs-modeset:
pass -> DMESG-WARN (hsw-brixbox)
pass -> DMESG-WARN (bdw-ultra)
Subgroup basic-flip-vs-wf_vblank:
fail -> PASS (bdw-ultra)
Test kms_setmode:
Subgroup basic-clone-single-crtc:
pass -> DMESG-WARN (snb-dellxps)
Test pm_rpm:
Subgroup basic-pci-d3-state:
dmesg-warn -> PASS (hsw-brixbox)
Subgroup basic-rte:
pass -> DMESG-WARN (bsw-nuc-2)
bdw-nuci7 total:194 pass:182 dwarn:0 dfail:0 fail:0 skip:12
bdw-ultra total:194 pass:172 dwarn:1 dfail:0 fail:0 skip:21
bsw-nuc-2 total:194 pass:156 dwarn:1 dfail:0 fail:0 skip:37
byt-nuc total:194 pass:159 dwarn:0 dfail:0 fail:0 skip:35
hsw-brixbox total:194 pass:170 dwarn:2 dfail:0 fail:0 skip:22
hsw-gt2 total:194 pass:176 dwarn:1 dfail:0 fail:0 skip:17
ilk-hp8440p total:194 pass:131 dwarn:0 dfail:0 fail:0 skip:63
ivb-t430s total:194 pass:169 dwarn:0 dfail:0 fail:0 skip:25
skl-i5k-2 total:194 pass:171 dwarn:0 dfail:0 fail:0 skip:23
skl-i7k-2 total:194 pass:171 dwarn:0 dfail:0 fail:0 skip:23
skl-nuci5 total:194 pass:183 dwarn:0 dfail:0 fail:0 skip:11
snb-dellxps total:194 pass:158 dwarn:2 dfail:0 fail:0 skip:34
snb-x220t total:194 pass:159 dwarn:1 dfail:0 fail:1 skip:33
Results at /archive/results/CI_IGT_test/Patchwork_1660/
3b8b9e2deb9f23e1a841d0f9d80296a9759ff8f8 drm-intel-nightly: 2016y-03m-21d-13h-03m-53s UTC integration manifest
05b0ac6bb1229e3ce67d8eeee63d31c63881a790 drm/i915/guc: disable GuC submission earlier during GuC (re)load
e8732066acfd15e934ae81a0a32795e68fc2238a drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission
2540484ff57c39c82e1077b068662e764f48c392 drm/i915/guc: refactor doorbell management code
d9a934919151a4c9ff50684344df96bce88f95ad drm/i915/guc: move guc_ring_doorbell() nearer to callsite
29c015879989226e4160520c47b4fc06c6b181d4 drm/i915/guc: add doorbell map to debugfs/i915_guc_info
c3da0b176b15cb8c507c996f93f3cbf21de35dfb drm/i915/guc: always reset GuC before loading firmware
91deefc6dea89149dfb66f43823e69bc5ac00e27 drm/i915/guc: Reset GuC and retry on firmware load failure
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-21 10:16 ` [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
@ 2016-03-21 16:58 ` Arun Siluvery
2016-03-23 16:03 ` Dave Gordon
0 siblings, 1 reply; 16+ messages in thread
From: Arun Siluvery @ 2016-03-21 16:58 UTC (permalink / raw)
To: Dave Gordon, intel-gfx
On 21/03/2016 10:16, Dave Gordon wrote:
> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>
> Due to timing issues in the HW some of the status bits required for GuC
> authentication doesn't get set occassionally, when that happens, GuC cannot
> be initialized and we will be left with a wedged GPU. The WA suggested is
> to perform a soft reset of GuC and attempt to reload the fw again for few
> times before giving up.
>
> As the failure is dependent on timing, tests performed by triggering manual
> full gpu reset (i915_wedged) showed that we could sometimes hit this after
> several thousand iterations but sometimes tests ran even longer without any
> issues. Reset and reload mechanism proved helpful when we indeed hit fw
> load failure so it is better to include this to improve driver stability.
>
> This change implements the following WA,
>
> WaEnableuKernelHeaderValidFix:skl,bxt
> WaEnableGuCBootHashCheckNotSet:skl,bxt
>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Alex Dai <yu.dai@intel.com>
> ---
This patch was previously reviewed by Alex,
https://patchwork.freedesktop.org/patch/76122/
> drivers/gpu/drm/i915/i915_drv.h | 1 +
> drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
> drivers/gpu/drm/i915/i915_reg.h | 1 +
> drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++--
> drivers/gpu/drm/i915/intel_uncore.c | 19 +++++++++++++
> 5 files changed, 69 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 00c41a4..5418850 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2711,6 +2711,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
> extern int intel_gpu_reset(struct drm_device *dev, u32 engine_mask);
> extern bool intel_has_gpu_reset(struct drm_device *dev);
> extern int i915_reset(struct drm_device *dev);
> +extern int intel_guc_reset(struct drm_i915_private *dev_priv);
> extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
> extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
> extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
> index e4ba582..94ceee5 100644
> --- a/drivers/gpu/drm/i915/i915_guc_reg.h
> +++ b/drivers/gpu/drm/i915/i915_guc_reg.h
> @@ -27,6 +27,7 @@
> /* Definitions of GuC H/W registers, bits, etc */
>
> #define GUC_STATUS _MMIO(0xc000)
> +#define GS_MIA_IN_RESET (1 << 0)
> #define GS_BOOTROM_SHIFT 1
> #define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
> #define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 07e0449..cc71ca2 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -165,6 +165,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
> #define GEN6_GRDOM_MEDIA (1 << 2)
> #define GEN6_GRDOM_BLT (1 << 3)
> #define GEN6_GRDOM_VECS (1 << 4)
> +#define GEN9_GRDOM_GUC (1 << 5)
> #define GEN8_GRDOM_MEDIA2 (1 << 7)
In the original patch GEN9_GRDOM_GUC was defined like above but during
tdr patch review, option of arranging them according to gen was
explored. In that case GEN9_GRDOM_GUC will be at the end.
regards
Arun
>
> #define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228)
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> index e1aff62..a07c228 100644
> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
> return ret;
> }
>
> +static int i915_reset_guc(struct drm_i915_private *dev_priv)
> +{
> + int ret;
> + u32 guc_status;
> +
> + ret = intel_guc_reset(dev_priv);
> + if (ret) {
> + DRM_ERROR("GuC reset failed, ret = %d\n", ret);
> + return ret;
> + }
> +
> + guc_status = I915_READ(GUC_STATUS);
> + WARN(!(guc_status & GS_MIA_IN_RESET),
> + "GuC status: 0x%x, MIA core expected to be in reset\n", guc_status);
> +
> + return ret;
> +}
> +
> /**
> * intel_guc_ucode_load() - load GuC uCode into the device
> * @dev: drm device
> @@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev)
> if (err)
> goto fail;
>
> + /*
> + * WaEnableuKernelHeaderValidFix:skl,bxt
> + * For BXT, this is only upto B0 but below WA is required for later
> + * steppings also so this is extended as well.
> + */
> + /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
> err = guc_ucode_xfer(dev_priv);
> - if (err)
> - goto fail;
> + if (err) {
> + int retries = 3;
> +
> + DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
> +
> + while (retries--) {
> + err = i915_reset_guc(dev_priv);
> + if (err)
> + break;
> +
> + err = guc_ucode_xfer(dev_priv);
> + if (!err) {
> + DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
> + break;
> + }
> + DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
> + }
> +
> + if (err) {
> + DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
> + goto fail;
> + }
> + }
>
> guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 512b7fa..d44e07e 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1673,6 +1673,25 @@ bool intel_has_gpu_reset(struct drm_device *dev)
> return intel_get_gpu_reset(dev) != NULL;
> }
>
> +int intel_guc_reset(struct drm_i915_private *dev_priv)
> +{
> + int ret;
> + unsigned long irqflags;
> +
> + if (!i915.enable_guc_submission)
> + return -EINVAL;
> +
> + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> + spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
> +
> + ret = gen6_hw_domain_reset(dev_priv, GEN9_GRDOM_GUC);
> +
> + spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
> + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +
> + return ret;
> +}
> +
> bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
> {
> return check_for_unclaimed_mmio(dev_priv);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware
2016-03-21 10:16 ` [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
@ 2016-03-22 13:32 ` Arun Siluvery
2016-03-22 18:42 ` Arun Siluvery
0 siblings, 1 reply; 16+ messages in thread
From: Arun Siluvery @ 2016-03-22 13:32 UTC (permalink / raw)
To: Dave Gordon, intel-gfx
On 21/03/2016 10:16, Dave Gordon wrote:
> After a suspend-resume cycle, the resumed kernel has no idea what the
> booted kernel may have done to the GuC before replacing itself with the
> resumed image. In particular, it may have already loaded the GuC with
> firmware, which will then cause this kernel's attempt to (re)load the
> firmware to fail (GuC program memory is write-once!).
>
> So here we choose to always reset the GuC just before (re)loading the
> firmware, so the hardware should then be in a well-known state, and we
> may even avoid some of the issues arising from unpredictable timing.
>
> Also added some more fields & values to the definition of the GUC_STATUS
> register, which is the key diagnostic indicator if the GuC load fails.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
> Cc: Alex Dai <yu.dai@intel.com>
>
Please add bugzilla reference.
> ---
> drivers/gpu/drm/i915/i915_guc_reg.h | 12 ++++++++--
> drivers/gpu/drm/i915/intel_guc_loader.c | 40 ++++++++++++++++-----------------
> 2 files changed, 29 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
> index 94ceee5..80786d9 100644
> --- a/drivers/gpu/drm/i915/i915_guc_reg.h
> +++ b/drivers/gpu/drm/i915/i915_guc_reg.h
> @@ -27,10 +27,12 @@
> /* Definitions of GuC H/W registers, bits, etc */
>
> #define GUC_STATUS _MMIO(0xc000)
> -#define GS_MIA_IN_RESET (1 << 0)
> +#define GS_RESET_SHIFT 0
> +#define GS_MIA_IN_RESET (0x01 << GS_RESET_SHIFT)
> #define GS_BOOTROM_SHIFT 1
> #define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
> #define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
> +#define GS_BOOTROM_JUMP_PASSED (0x76 << GS_BOOTROM_SHIFT)
> #define GS_UKERNEL_SHIFT 8
> #define GS_UKERNEL_MASK (0xFF << GS_UKERNEL_SHIFT)
> #define GS_UKERNEL_LAPIC_DONE (0x30 << GS_UKERNEL_SHIFT)
> @@ -38,7 +40,13 @@
> #define GS_UKERNEL_READY (0xF0 << GS_UKERNEL_SHIFT)
> #define GS_MIA_SHIFT 16
> #define GS_MIA_MASK (0x07 << GS_MIA_SHIFT)
> -#define GS_MIA_CORE_STATE (1 << GS_MIA_SHIFT)
> +#define GS_MIA_CORE_STATE (0x01 << GS_MIA_SHIFT)
> +#define GS_MIA_HALT_REQUESTED (0x02 << GS_MIA_SHIFT)
> +#define GS_MIA_ISR_ENTRY (0x04 << GS_MIA_SHIFT)
> +#define GS_AUTH_STATUS_SHIFT 30
> +#define GS_AUTH_STATUS_MASK (0x03 << GS_AUTH_STATUS_SHIFT)
> +#define GS_AUTH_STATUS_BAD (0x01 << GS_AUTH_STATUS_SHIFT)
> +#define GS_AUTH_STATUS_GOOD (0x02 << GS_AUTH_STATUS_SHIFT)
>
> #define SOFT_SCRATCH(n) _MMIO(0xc180 + (n) * 4)
> #define SOFT_SCRATCH_COUNT 16
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> index a07c228..a875936 100644
> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -387,7 +387,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
> {
> struct drm_i915_private *dev_priv = dev->dev_private;
> struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
> - int err = 0;
> + int retries, err = 0;
>
> if (!i915.enable_guc_submission)
> return 0;
> @@ -441,29 +441,26 @@ int intel_guc_ucode_load(struct drm_device *dev)
> * steppings also so this is extended as well.
> */
> /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
> - err = guc_ucode_xfer(dev_priv);
> - if (err) {
> - int retries = 3;
> -
> - DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
> -
> - while (retries--) {
> - err = i915_reset_guc(dev_priv);
> - if (err)
> - break;
> -
> - err = guc_ucode_xfer(dev_priv);
> - if (!err) {
> - DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
> - break;
> - }
> - DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
> - }
> -
> + for (retries = 3; ; ) {
> + /*
> + * Always reset the GuC just before (re)loading, so
> + * that the state and timing are fairly predictable
> + */
> + err = i915_reset_guc(dev_priv);
> if (err) {
> - DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
> + DRM_ERROR("GuC reset failed, err %d\n", err);
> goto fail;
> }
> +
> + err = guc_ucode_xfer(dev_priv);
> + if (!err)
> + break;
> +
> + if (--retries == 0)
> + goto fail;
> +
> + DRM_INFO("GuC fw load failed, err %d; will reset and "
> + "retry %d more time(s)\n", err, retries);
> }
>
> guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
> @@ -485,6 +482,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
> return 0;
>
> fail:
> + DRM_ERROR("GuC firmware load failed, err %d\n", err);
> if (guc_fw->guc_fw_load_status == GUC_FIRMWARE_PENDING)
> guc_fw->guc_fw_load_status = GUC_FIRMWARE_FAIL;
>
>
looks good to me,
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
regards
Arun
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware
2016-03-22 13:32 ` Arun Siluvery
@ 2016-03-22 18:42 ` Arun Siluvery
2016-03-23 17:05 ` [PATCH v2 1/2] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
2016-03-23 17:05 ` [PATCH v2 2/2] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
0 siblings, 2 replies; 16+ messages in thread
From: Arun Siluvery @ 2016-03-22 18:42 UTC (permalink / raw)
To: Dave Gordon, intel-gfx
On 22/03/2016 13:32, Arun Siluvery wrote:
> On 21/03/2016 10:16, Dave Gordon wrote:
>> After a suspend-resume cycle, the resumed kernel has no idea what the
>> booted kernel may have done to the GuC before replacing itself with the
>> resumed image. In particular, it may have already loaded the GuC with
>> firmware, which will then cause this kernel's attempt to (re)load the
>> firmware to fail (GuC program memory is write-once!).
>>
>> So here we choose to always reset the GuC just before (re)loading the
>> firmware, so the hardware should then be in a well-known state, and we
>> may even avoid some of the issues arising from unpredictable timing.
>>
>> Also added some more fields & values to the definition of the GUC_STATUS
>> register, which is the key diagnostic indicator if the GuC load fails.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Cc: Alex Dai <yu.dai@intel.com>
>>
> Please add bugzilla reference.
https://bugs.freedesktop.org/show_bug.cgi?id=94390
regards
Arun
>
>> ---
>> drivers/gpu/drm/i915/i915_guc_reg.h | 12 ++++++++--
>> drivers/gpu/drm/i915/intel_guc_loader.c | 40
>> ++++++++++++++++-----------------
>> 2 files changed, 29 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h
>> b/drivers/gpu/drm/i915/i915_guc_reg.h
>> index 94ceee5..80786d9 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_guc_reg.h
>> @@ -27,10 +27,12 @@
>> /* Definitions of GuC H/W registers, bits, etc */
>>
>> #define GUC_STATUS _MMIO(0xc000)
>> -#define GS_MIA_IN_RESET (1 << 0)
>> +#define GS_RESET_SHIFT 0
>> +#define GS_MIA_IN_RESET (0x01 << GS_RESET_SHIFT)
>> #define GS_BOOTROM_SHIFT 1
>> #define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
>> #define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
>> +#define GS_BOOTROM_JUMP_PASSED (0x76 << GS_BOOTROM_SHIFT)
>> #define GS_UKERNEL_SHIFT 8
>> #define GS_UKERNEL_MASK (0xFF << GS_UKERNEL_SHIFT)
>> #define GS_UKERNEL_LAPIC_DONE (0x30 << GS_UKERNEL_SHIFT)
>> @@ -38,7 +40,13 @@
>> #define GS_UKERNEL_READY (0xF0 << GS_UKERNEL_SHIFT)
>> #define GS_MIA_SHIFT 16
>> #define GS_MIA_MASK (0x07 << GS_MIA_SHIFT)
>> -#define GS_MIA_CORE_STATE (1 << GS_MIA_SHIFT)
>> +#define GS_MIA_CORE_STATE (0x01 << GS_MIA_SHIFT)
>> +#define GS_MIA_HALT_REQUESTED (0x02 << GS_MIA_SHIFT)
>> +#define GS_MIA_ISR_ENTRY (0x04 << GS_MIA_SHIFT)
>> +#define GS_AUTH_STATUS_SHIFT 30
>> +#define GS_AUTH_STATUS_MASK (0x03 << GS_AUTH_STATUS_SHIFT)
>> +#define GS_AUTH_STATUS_BAD (0x01 << GS_AUTH_STATUS_SHIFT)
>> +#define GS_AUTH_STATUS_GOOD (0x02 << GS_AUTH_STATUS_SHIFT)
>>
>> #define SOFT_SCRATCH(n) _MMIO(0xc180 + (n) * 4)
>> #define SOFT_SCRATCH_COUNT 16
>> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c
>> b/drivers/gpu/drm/i915/intel_guc_loader.c
>> index a07c228..a875936 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
>> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
>> @@ -387,7 +387,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
>> {
>> struct drm_i915_private *dev_priv = dev->dev_private;
>> struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
>> - int err = 0;
>> + int retries, err = 0;
>>
>> if (!i915.enable_guc_submission)
>> return 0;
>> @@ -441,29 +441,26 @@ int intel_guc_ucode_load(struct drm_device *dev)
>> * steppings also so this is extended as well.
>> */
>> /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
>> - err = guc_ucode_xfer(dev_priv);
>> - if (err) {
>> - int retries = 3;
>> -
>> - DRM_ERROR("GuC fw load failed, err=%d, attempting reset and
>> retry\n", err);
>> -
>> - while (retries--) {
>> - err = i915_reset_guc(dev_priv);
>> - if (err)
>> - break;
>> -
>> - err = guc_ucode_xfer(dev_priv);
>> - if (!err) {
>> - DRM_DEBUG_DRIVER("GuC fw reload succeeded after
>> reset\n");
>> - break;
>> - }
>> - DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n",
>> retries);
>> - }
>> -
>> + for (retries = 3; ; ) {
>> + /*
>> + * Always reset the GuC just before (re)loading, so
>> + * that the state and timing are fairly predictable
>> + */
>> + err = i915_reset_guc(dev_priv);
>> if (err) {
>> - DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
>> + DRM_ERROR("GuC reset failed, err %d\n", err);
>> goto fail;
>> }
>> +
>> + err = guc_ucode_xfer(dev_priv);
>> + if (!err)
>> + break;
>> +
>> + if (--retries == 0)
>> + goto fail;
>> +
>> + DRM_INFO("GuC fw load failed, err %d; will reset and "
>> + "retry %d more time(s)\n", err, retries);
>> }
>>
>> guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
>> @@ -485,6 +482,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
>> return 0;
>>
>> fail:
>> + DRM_ERROR("GuC firmware load failed, err %d\n", err);
>> if (guc_fw->guc_fw_load_status == GUC_FIRMWARE_PENDING)
>> guc_fw->guc_fw_load_status = GUC_FIRMWARE_FAIL;
>>
>>
> looks good to me,
> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>
> regards
> Arun
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-21 16:58 ` Arun Siluvery
@ 2016-03-23 16:03 ` Dave Gordon
0 siblings, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-23 16:03 UTC (permalink / raw)
To: Arun Siluvery, intel-gfx
On 21/03/16 16:58, Arun Siluvery wrote:
> On 21/03/2016 10:16, Dave Gordon wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> Due to timing issues in the HW some of the status bits required for GuC
>> authentication doesn't get set occassionally, when that happens, GuC
>> cannot
>> be initialized and we will be left with a wedged GPU. The WA suggested is
>> to perform a soft reset of GuC and attempt to reload the fw again for few
>> times before giving up.
>>
>> As the failure is dependent on timing, tests performed by triggering
>> manual
>> full gpu reset (i915_wedged) showed that we could sometimes hit this
>> after
>> several thousand iterations but sometimes tests ran even longer
>> without any
>> issues. Reset and reload mechanism proved helpful when we indeed hit fw
>> load failure so it is better to include this to improve driver stability.
>>
>> This change implements the following WA,
>>
>> WaEnableuKernelHeaderValidFix:skl,bxt
>> WaEnableGuCBootHashCheckNotSet:skl,bxt
>>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Alex Dai <yu.dai@intel.com>
>> ---
> This patch was previously reviewed by Alex,
>
> https://patchwork.freedesktop.org/patch/76122/
OK, I'm going to repost just the first two from this set, tagged with
R-Bs from Alex & you, and the Bugzilla reference; then the rest can form
a new sequence for Alex to review when he gets back from vacation.
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index 07e0449..cc71ca2 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -165,6 +165,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t
>> reg)
>> #define GEN6_GRDOM_MEDIA (1 << 2)
>> #define GEN6_GRDOM_BLT (1 << 3)
>> #define GEN6_GRDOM_VECS (1 << 4)
>> +#define GEN9_GRDOM_GUC (1 << 5)
>> #define GEN8_GRDOM_MEDIA2 (1 << 7)
> In the original patch GEN9_GRDOM_GUC was defined like above but during
> tdr patch review, option of arranging them according to gen was
> explored. In that case GEN9_GRDOM_GUC will be at the end.
>
> regards
> Arun
I think they're better in bit-number order, because that way it's easier
to see that you haven't accidentally got overlapping definitions.
.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/2] drm/i915/guc: Reset GuC and retry on firmware load failure
2016-03-22 18:42 ` Arun Siluvery
@ 2016-03-23 17:05 ` Dave Gordon
2016-03-23 17:05 ` [PATCH v2 2/2] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
1 sibling, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-23 17:05 UTC (permalink / raw)
To: intel-gfx
From: Arun Siluvery <arun.siluvery@linux.intel.com>
Due to timing issues in the HW some of the status bits required for GuC
authentication doesn't get set occassionally, when that happens, GuC cannot
be initialized and we will be left with a wedged GPU. The WA suggested is
to perform a soft reset of GuC and attempt to reload the fw again for few
times before giving up.
As the failure is dependent on timing, tests performed by triggering manual
full gpu reset (i915_wedged) showed that we could sometimes hit this after
several thousand iterations but sometimes tests ran even longer without any
issues. Reset and reload mechanism proved helpful when we indeed hit fw
load failure so it is better to include this to improve driver stability.
This change implements the following WA,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_guc_reg.h | 1 +
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++--
drivers/gpu/drm/i915/intel_uncore.c | 19 +++++++++++++
5 files changed, 69 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9d29ab0..8380fe0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2740,6 +2740,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
extern int intel_gpu_reset(struct drm_device *dev, u32 engine_mask);
extern bool intel_has_gpu_reset(struct drm_device *dev);
extern int i915_reset(struct drm_device *dev);
+extern int intel_guc_reset(struct drm_i915_private *dev_priv);
extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
index e4ba582..94ceee5 100644
--- a/drivers/gpu/drm/i915/i915_guc_reg.h
+++ b/drivers/gpu/drm/i915/i915_guc_reg.h
@@ -27,6 +27,7 @@
/* Definitions of GuC H/W registers, bits, etc */
#define GUC_STATUS _MMIO(0xc000)
+#define GS_MIA_IN_RESET (1 << 0)
#define GS_BOOTROM_SHIFT 1
#define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
#define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index f3ba43c..ed0531ac 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -165,6 +165,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
#define GEN6_GRDOM_MEDIA (1 << 2)
#define GEN6_GRDOM_BLT (1 << 3)
#define GEN6_GRDOM_VECS (1 << 4)
+#define GEN9_GRDOM_GUC (1 << 5)
#define GEN8_GRDOM_MEDIA2 (1 << 7)
#define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228)
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index e1aff62..a07c228 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
return ret;
}
+static int i915_reset_guc(struct drm_i915_private *dev_priv)
+{
+ int ret;
+ u32 guc_status;
+
+ ret = intel_guc_reset(dev_priv);
+ if (ret) {
+ DRM_ERROR("GuC reset failed, ret = %d\n", ret);
+ return ret;
+ }
+
+ guc_status = I915_READ(GUC_STATUS);
+ WARN(!(guc_status & GS_MIA_IN_RESET),
+ "GuC status: 0x%x, MIA core expected to be in reset\n", guc_status);
+
+ return ret;
+}
+
/**
* intel_guc_ucode_load() - load GuC uCode into the device
* @dev: drm device
@@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev)
if (err)
goto fail;
+ /*
+ * WaEnableuKernelHeaderValidFix:skl,bxt
+ * For BXT, this is only upto B0 but below WA is required for later
+ * steppings also so this is extended as well.
+ */
+ /* WaEnableGuCBootHashCheckNotSet:skl,bxt */
err = guc_ucode_xfer(dev_priv);
- if (err)
- goto fail;
+ if (err) {
+ int retries = 3;
+
+ DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
+
+ while (retries--) {
+ err = i915_reset_guc(dev_priv);
+ if (err)
+ break;
+
+ err = guc_ucode_xfer(dev_priv);
+ if (!err) {
+ DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
+ break;
+ }
+ DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
+ }
+
+ if (err) {
+ DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
+ goto fail;
+ }
+ }
guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 512b7fa..d44e07e 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1673,6 +1673,25 @@ bool intel_has_gpu_reset(struct drm_device *dev)
return intel_get_gpu_reset(dev) != NULL;
}
+int intel_guc_reset(struct drm_i915_private *dev_priv)
+{
+ int ret;
+ unsigned long irqflags;
+
+ if (!i915.enable_guc_submission)
+ return -EINVAL;
+
+ intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+ spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+
+ ret = gen6_hw_domain_reset(dev_priv, GEN9_GRDOM_GUC);
+
+ spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+ intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+
+ return ret;
+}
+
bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
{
return check_for_unclaimed_mmio(dev_priv);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/2] drm/i915/guc: always reset GuC before loading firmware
2016-03-22 18:42 ` Arun Siluvery
2016-03-23 17:05 ` [PATCH v2 1/2] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
@ 2016-03-23 17:05 ` Dave Gordon
1 sibling, 0 replies; 16+ messages in thread
From: Dave Gordon @ 2016-03-23 17:05 UTC (permalink / raw)
To: intel-gfx
After a suspend-resume cycle, the resumed kernel has no idea what the
booted kernel may have done to the GuC before replacing itself with the
resumed image. In particular, it may have already loaded the GuC with
firmware, which will then cause this kernel's attempt to (re)load the
firmware to fail (GuC program memory is write-once!). The symptoms
(GuC firmware reload fails after hibernation) are further described
in the Bugzilla reference below.
So let's *always* reset the GuC just before (re)loading the firmware;
then the hardware should then be in a well-known state, and we may even
avoid some of the issues arising from unpredictable timing.
Also added some more fields & values to the definition of the GUC_STATUS
register, which is the key diagnostic indicator if the GuC load fails.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94390
---
drivers/gpu/drm/i915/i915_guc_reg.h | 12 ++++++++--
drivers/gpu/drm/i915/intel_guc_loader.c | 40 ++++++++++++++++-----------------
2 files changed, 29 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h
index 94ceee5..80786d9 100644
--- a/drivers/gpu/drm/i915/i915_guc_reg.h
+++ b/drivers/gpu/drm/i915/i915_guc_reg.h
@@ -27,10 +27,12 @@
/* Definitions of GuC H/W registers, bits, etc */
#define GUC_STATUS _MMIO(0xc000)
-#define GS_MIA_IN_RESET (1 << 0)
+#define GS_RESET_SHIFT 0
+#define GS_MIA_IN_RESET (0x01 << GS_RESET_SHIFT)
#define GS_BOOTROM_SHIFT 1
#define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT)
#define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT)
+#define GS_BOOTROM_JUMP_PASSED (0x76 << GS_BOOTROM_SHIFT)
#define GS_UKERNEL_SHIFT 8
#define GS_UKERNEL_MASK (0xFF << GS_UKERNEL_SHIFT)
#define GS_UKERNEL_LAPIC_DONE (0x30 << GS_UKERNEL_SHIFT)
@@ -38,7 +40,13 @@
#define GS_UKERNEL_READY (0xF0 << GS_UKERNEL_SHIFT)
#define GS_MIA_SHIFT 16
#define GS_MIA_MASK (0x07 << GS_MIA_SHIFT)
-#define GS_MIA_CORE_STATE (1 << GS_MIA_SHIFT)
+#define GS_MIA_CORE_STATE (0x01 << GS_MIA_SHIFT)
+#define GS_MIA_HALT_REQUESTED (0x02 << GS_MIA_SHIFT)
+#define GS_MIA_ISR_ENTRY (0x04 << GS_MIA_SHIFT)
+#define GS_AUTH_STATUS_SHIFT 30
+#define GS_AUTH_STATUS_MASK (0x03 << GS_AUTH_STATUS_SHIFT)
+#define GS_AUTH_STATUS_BAD (0x01 << GS_AUTH_STATUS_SHIFT)
+#define GS_AUTH_STATUS_GOOD (0x02 << GS_AUTH_STATUS_SHIFT)
#define SOFT_SCRATCH(n) _MMIO(0xc180 + (n) * 4)
#define SOFT_SCRATCH_COUNT 16
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index a07c228..a875936 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -387,7 +387,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
- int err = 0;
+ int retries, err = 0;
if (!i915.enable_guc_submission)
return 0;
@@ -441,29 +441,26 @@ int intel_guc_ucode_load(struct drm_device *dev)
* steppings also so this is extended as well.
*/
/* WaEnableGuCBootHashCheckNotSet:skl,bxt */
- err = guc_ucode_xfer(dev_priv);
- if (err) {
- int retries = 3;
-
- DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err);
-
- while (retries--) {
- err = i915_reset_guc(dev_priv);
- if (err)
- break;
-
- err = guc_ucode_xfer(dev_priv);
- if (!err) {
- DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n");
- break;
- }
- DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries);
- }
-
+ for (retries = 3; ; ) {
+ /*
+ * Always reset the GuC just before (re)loading, so
+ * that the state and timing are fairly predictable
+ */
+ err = i915_reset_guc(dev_priv);
if (err) {
- DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err);
+ DRM_ERROR("GuC reset failed, err %d\n", err);
goto fail;
}
+
+ err = guc_ucode_xfer(dev_priv);
+ if (!err)
+ break;
+
+ if (--retries == 0)
+ goto fail;
+
+ DRM_INFO("GuC fw load failed, err %d; will reset and "
+ "retry %d more time(s)\n", err, retries);
}
guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS;
@@ -485,6 +482,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
return 0;
fail:
+ DRM_ERROR("GuC firmware load failed, err %d\n", err);
if (guc_fw->guc_fw_load_status == GUC_FIRMWARE_PENDING)
guc_fw->guc_fw_load_status = GUC_FIRMWARE_FAIL;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 16+ messages in thread
* ✗ Fi.CI.BAT: failure for Fixes and worarounds for GuC issues (rev3)
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
` (7 preceding siblings ...)
2016-03-21 16:23 ` ✗ Fi.CI.BAT: warning for Fixes and worarounds for GuC issues Patchwork
@ 2016-03-24 8:13 ` Patchwork
8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2016-03-24 8:13 UTC (permalink / raw)
To: Dave Gordon; +Cc: intel-gfx
== Series Details ==
Series: Fixes and worarounds for GuC issues (rev3)
URL : https://patchwork.freedesktop.org/series/4694/
State : failure
== Summary ==
Series 4694v3 Fixes and worarounds for GuC issues
2016-03-23T16:43:55.381802 http://patchwork.freedesktop.org/api/1.0/series/4694/revisions/3/mbox/
Applying: drm/i915/guc: Reset GuC and retry on firmware load failure
Using index info to reconstruct a base tree...
M drivers/gpu/drm/i915/i915_drv.h
M drivers/gpu/drm/i915/i915_reg.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_reg.h
Auto-merging drivers/gpu/drm/i915/i915_drv.h
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_drv.h
Patch failed at 0001 drm/i915/guc: Reset GuC and retry on firmware load failure
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-03-24 8:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-21 10:16 [PATCH 0/7] Fixes and worarounds for GuC issues Dave Gordon
2016-03-21 10:16 ` [PATCH 1/7] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
2016-03-21 16:58 ` Arun Siluvery
2016-03-23 16:03 ` Dave Gordon
2016-03-21 10:16 ` [PATCH 2/7] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
2016-03-22 13:32 ` Arun Siluvery
2016-03-22 18:42 ` Arun Siluvery
2016-03-23 17:05 ` [PATCH v2 1/2] drm/i915/guc: Reset GuC and retry on firmware load failure Dave Gordon
2016-03-23 17:05 ` [PATCH v2 2/2] drm/i915/guc: always reset GuC before loading firmware Dave Gordon
2016-03-21 10:16 ` [PATCH 3/7] drm/i915/guc: add doorbell map to debugfs/i915_guc_info Dave Gordon
2016-03-21 10:16 ` [PATCH 4/7] drm/i915/guc: move guc_ring_doorbell() nearer to callsite Dave Gordon
2016-03-21 10:16 ` [PATCH 5/7] drm/i915/guc: refactor doorbell management code Dave Gordon
2016-03-21 10:16 ` [PATCH 6/7] drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission Dave Gordon
2016-03-21 10:16 ` [PATCH 7/7] drm/i915/guc: disable GuC submission earlier during GuC (re)load Dave Gordon
2016-03-21 16:23 ` ✗ Fi.CI.BAT: warning for Fixes and worarounds for GuC issues Patchwork
2016-03-24 8:13 ` ✗ Fi.CI.BAT: failure for Fixes and worarounds for GuC issues (rev3) Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).