public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Include GuC fw version in error state
@ 2017-02-23 23:11 Michel Thierry
  2017-02-24  0:22 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Michel Thierry @ 2017-02-23 23:11 UTC (permalink / raw)
  To: intel-gfx

There was no way to check if the platform is running the latest firmware.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2b1d15668192..e022187916ee 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 			   CSR_VERSION_MINOR(csr->version));
 	}
 
+	if (HAS_GUC_UCODE(dev_priv)) {
+		struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
+
+		err_printf(m, "GuC loaded: %s\n",
+			   yesno(guc_fw->load_status ==
+				 INTEL_UC_FIRMWARE_SUCCESS));
+		err_printf(m, "GuC fw version: %d.%d\n",
+			   guc_fw->major_ver_found, guc_fw->minor_ver_found);
+	}
+
 	err_printf(m, "EIR: 0x%08x\n", error->eir);
 	err_printf(m, "IER: 0x%08x\n", error->ier);
 	for (i = 0; i < error->ngtier; i++)
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Include GuC fw version in error state
  2017-02-23 23:11 [PATCH] drm/i915: Include GuC fw version in error state Michel Thierry
@ 2017-02-24  0:22 ` Patchwork
  2017-02-24  3:43 ` [PATCH] " Kamble, Sagar A
  2017-02-24 10:40 ` Michal Wajdeczko
  2 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2017-02-24  0:22 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Include GuC fw version in error state
URL   : https://patchwork.freedesktop.org/series/20181/
State : success

== Summary ==

Series 20181v1 drm/i915: Include GuC fw version in error state
https://patchwork.freedesktop.org/api/1.0/series/20181/revisions/1/mbox/

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11 
fi-bsw-n3050     total:278  pass:239  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19 
fi-bxt-t5700     total:108  pass:95   dwarn:0   dfail:0   fail:0   skip:12 
fi-byt-j1900     total:278  pass:251  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:278  pass:247  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16 
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16 
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50 
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18 
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18 
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18 
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10 
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17 
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18 
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10 
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28 
fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29 

c6638f903295bbbd29957b878a42b83c5566250c drm-tip: 2017y-02m-23d-22h-50m-35s UTC integration manifest
b721bdd drm/i915: Include GuC fw version in error state

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3956/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-23 23:11 [PATCH] drm/i915: Include GuC fw version in error state Michel Thierry
  2017-02-24  0:22 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-02-24  3:43 ` Kamble, Sagar A
  2017-02-24  9:13   ` Chris Wilson
  2017-02-24 10:40 ` Michal Wajdeczko
  2 siblings, 1 reply; 13+ messages in thread
From: Kamble, Sagar A @ 2017-02-24  3:43 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1273 bytes --]

Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>

On 2/24/2017 4:41 AM, Michel Thierry wrote:
> There was no way to check if the platform is running the latest firmware.
>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2b1d15668192..e022187916ee 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   			   CSR_VERSION_MINOR(csr->version));
>   	}
>   
> +	if (HAS_GUC_UCODE(dev_priv)) {
> +		struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
> +
> +		err_printf(m, "GuC loaded: %s\n",
> +			   yesno(guc_fw->load_status ==
> +				 INTEL_UC_FIRMWARE_SUCCESS));
> +		err_printf(m, "GuC fw version: %d.%d\n",
> +			   guc_fw->major_ver_found, guc_fw->minor_ver_found);
> +	}
> +
>   	err_printf(m, "EIR: 0x%08x\n", error->eir);
>   	err_printf(m, "IER: 0x%08x\n", error->ier);
>   	for (i = 0; i < error->ngtier; i++)


[-- Attachment #1.2: Type: text/html, Size: 1948 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24  3:43 ` [PATCH] " Kamble, Sagar A
@ 2017-02-24  9:13   ` Chris Wilson
  2017-02-24 10:43     ` Michal Wajdeczko
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2017-02-24  9:13 UTC (permalink / raw)
  To: Kamble, Sagar A; +Cc: intel-gfx

On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
>    Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
> 
>    On 2/24/2017 4:41 AM, Michel Thierry wrote:
> 
>  There was no way to check if the platform is running the latest firmware.
> 
>  Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
>  Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
>  Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
>  ---
>   drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>  index 2b1d15668192..e022187916ee 100644
>  --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>  +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>  @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>                             CSR_VERSION_MINOR(csr->version));
>          }
> 
>  +       if (HAS_GUC_UCODE(dev_priv)) {
>  +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>  +
>  +               err_printf(m, "GuC loaded: %s\n",
>  +                          yesno(guc_fw->load_status ==
>  +                                INTEL_UC_FIRMWARE_SUCCESS));
>  +               err_printf(m, "GuC fw version: %d.%d\n",
>  +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
>  +       }
>  +

Hmm. The firmware may change between the hang and cat
/sys/class/drm/card0/error (as it will be reloaded after the reset).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-23 23:11 [PATCH] drm/i915: Include GuC fw version in error state Michel Thierry
  2017-02-24  0:22 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-02-24  3:43 ` [PATCH] " Kamble, Sagar A
@ 2017-02-24 10:40 ` Michal Wajdeczko
  2017-02-24 16:15   ` Michel Thierry
  2 siblings, 1 reply; 13+ messages in thread
From: Michal Wajdeczko @ 2017-02-24 10:40 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Thu, Feb 23, 2017 at 03:11:37PM -0800, Michel Thierry wrote:
> There was no way to check if the platform is running the latest firmware.

Can we also add similar patch for the HuC ?

> 
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2b1d15668192..e022187916ee 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  			   CSR_VERSION_MINOR(csr->version));
>  	}
>  
> +	if (HAS_GUC_UCODE(dev_priv)) {
> +		struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;

I would preffer to use HAS_GUC and intel_guc* here.


> +
> +		err_printf(m, "GuC loaded: %s\n",
> +			   yesno(guc_fw->load_status ==
> +				 INTEL_UC_FIRMWARE_SUCCESS));

Hmm, as we do have more detailed load status, why limiting it to yes/no only?


-Michal  

> +		err_printf(m, "GuC fw version: %d.%d\n",
> +			   guc_fw->major_ver_found, guc_fw->minor_ver_found);
> +	}
> +
>  	err_printf(m, "EIR: 0x%08x\n", error->eir);
>  	err_printf(m, "IER: 0x%08x\n", error->ier);
>  	for (i = 0; i < error->ngtier; i++)
> -- 
> 2.11.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24  9:13   ` Chris Wilson
@ 2017-02-24 10:43     ` Michal Wajdeczko
  2017-02-24 10:49       ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Wajdeczko @ 2017-02-24 10:43 UTC (permalink / raw)
  To: Chris Wilson, Kamble, Sagar A, Michel Thierry, intel-gfx

On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
> On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
> >    Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
> > 
> >    On 2/24/2017 4:41 AM, Michel Thierry wrote:
> > 
> >  There was no way to check if the platform is running the latest firmware.
> > 
> >  Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
> >  Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
> >  Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
> >  ---
> >   drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
> >   1 file changed, 10 insertions(+)
> > 
> >  diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> >  index 2b1d15668192..e022187916ee 100644
> >  --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> >  +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> >  @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >                             CSR_VERSION_MINOR(csr->version));
> >          }
> > 
> >  +       if (HAS_GUC_UCODE(dev_priv)) {
> >  +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
> >  +
> >  +               err_printf(m, "GuC loaded: %s\n",
> >  +                          yesno(guc_fw->load_status ==
> >  +                                INTEL_UC_FIRMWARE_SUCCESS));
> >  +               err_printf(m, "GuC fw version: %d.%d\n",
> >  +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
> >  +       }
> >  +
> 
> Hmm. The firmware may change between the hang and cat
> /sys/class/drm/card0/error (as it will be reloaded after the reset).

Btw, maybe we should add counter that will be incremented on each fw reload
and reported here ?

-Michal

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 10:43     ` Michal Wajdeczko
@ 2017-02-24 10:49       ` Chris Wilson
  2017-02-24 15:45         ` Kamble, Sagar A
  2017-02-24 16:30         ` Michel Thierry
  0 siblings, 2 replies; 13+ messages in thread
From: Chris Wilson @ 2017-02-24 10:49 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx

On Fri, Feb 24, 2017 at 11:43:32AM +0100, Michal Wajdeczko wrote:
> On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
> > On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
> > >    Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
> > > 
> > >    On 2/24/2017 4:41 AM, Michel Thierry wrote:
> > > 
> > >  There was no way to check if the platform is running the latest firmware.
> > > 
> > >  Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
> > >  Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
> > >  Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
> > >  ---
> > >   drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
> > >   1 file changed, 10 insertions(+)
> > > 
> > >  diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > >  index 2b1d15668192..e022187916ee 100644
> > >  --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > >  +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > >  @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> > >                             CSR_VERSION_MINOR(csr->version));
> > >          }
> > > 
> > >  +       if (HAS_GUC_UCODE(dev_priv)) {
> > >  +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
> > >  +
> > >  +               err_printf(m, "GuC loaded: %s\n",
> > >  +                          yesno(guc_fw->load_status ==
> > >  +                                INTEL_UC_FIRMWARE_SUCCESS));
> > >  +               err_printf(m, "GuC fw version: %d.%d\n",
> > >  +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
> > >  +       }
> > >  +
> > 
> > Hmm. The firmware may change between the hang and cat
> > /sys/class/drm/card0/error (as it will be reloaded after the reset).
> 
> Btw, maybe we should add counter that will be incremented on each fw reload
> and reported here ?

If it occurs to you that we need it for post-mortem debugging and having
it is worth more than any potential confusion....

I can see the need for knowing what guc/huc/dmc/etc was running at the
time of a hang - I just hope that what was previously running before an
earlier reset doesn't contribute. But that's why we focus on the first
error in a system...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 10:49       ` Chris Wilson
@ 2017-02-24 15:45         ` Kamble, Sagar A
  2017-02-24 16:30         ` Michel Thierry
  1 sibling, 0 replies; 13+ messages in thread
From: Kamble, Sagar A @ 2017-02-24 15:45 UTC (permalink / raw)
  To: Chris Wilson, Michal Wajdeczko, Michel Thierry, intel-gfx



On 2/24/2017 4:19 PM, Chris Wilson wrote:
> On Fri, Feb 24, 2017 at 11:43:32AM +0100, Michal Wajdeczko wrote:
>> On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
>>> On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
>>>>     Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
>>>>
>>>>     On 2/24/2017 4:41 AM, Michel Thierry wrote:
>>>>
>>>>   There was no way to check if the platform is running the latest firmware.
>>>>
>>>>   Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
>>>>   Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
>>>>   Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
>>>>   ---
>>>>    drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>>   diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>   index 2b1d15668192..e022187916ee 100644
>>>>   --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>   +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>   @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>>>>                              CSR_VERSION_MINOR(csr->version));
>>>>           }
>>>>
>>>>   +       if (HAS_GUC_UCODE(dev_priv)) {
>>>>   +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>>>>   +
>>>>   +               err_printf(m, "GuC loaded: %s\n",
>>>>   +                          yesno(guc_fw->load_status ==
>>>>   +                                INTEL_UC_FIRMWARE_SUCCESS));
>>>>   +               err_printf(m, "GuC fw version: %d.%d\n",
>>>>   +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
>>>>   +       }
>>>>   +
>>> Hmm. The firmware may change between the hang and cat
>>> /sys/class/drm/card0/error (as it will be reloaded after the reset).
>> Btw, maybe we should add counter that will be incremented on each fw reload
>> and reported here ?
> If it occurs to you that we need it for post-mortem debugging and having
> it is worth more than any potential confusion....
>
> I can see the need for knowing what guc/huc/dmc/etc was running at the
> time of a hang - I just hope that what was previously running before an
> earlier reset doesn't contribute. But that's why we focus on the first
> error in a system...
> -Chris
>
GT reset count is present already in error state. GuC kernel parameters 
are present and this
change will help us identify which firmware issue was encountered.
So I feel printing ver_found should be enough.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 10:40 ` Michal Wajdeczko
@ 2017-02-24 16:15   ` Michel Thierry
  2017-02-24 16:21     ` Michel Thierry
  0 siblings, 1 reply; 13+ messages in thread
From: Michel Thierry @ 2017-02-24 16:15 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx



On 2/24/2017 2:40 AM, Michal Wajdeczko wrote:
> On Thu, Feb 23, 2017 at 03:11:37PM -0800, Michel Thierry wrote:
>> There was no way to check if the platform is running the latest firmware.
>
> Can we also add similar patch for the HuC ?
>

Please don't tell me the HuC can hang the gpu too.

>>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 2b1d15668192..e022187916ee 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>>  			   CSR_VERSION_MINOR(csr->version));
>>  	}
>>
>> +	if (HAS_GUC_UCODE(dev_priv)) {
>> +		struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>
> I would preffer to use HAS_GUC and intel_guc* here.
>
>
>> +
>> +		err_printf(m, "GuC loaded: %s\n",
>> +			   yesno(guc_fw->load_status ==
>> +				 INTEL_UC_FIRMWARE_SUCCESS));
>
> Hmm, as we do have more detailed load status, why limiting it to yes/no only?
>
>
> -Michal
>
>> +		err_printf(m, "GuC fw version: %d.%d\n",
>> +			   guc_fw->major_ver_found, guc_fw->minor_ver_found);
>> +	}
>> +
>>  	err_printf(m, "EIR: 0x%08x\n", error->eir);
>>  	err_printf(m, "IER: 0x%08x\n", error->ier);
>>  	for (i = 0; i < error->ngtier; i++)
>> --
>> 2.11.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 16:15   ` Michel Thierry
@ 2017-02-24 16:21     ` Michel Thierry
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Thierry @ 2017-02-24 16:21 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx

On 2/24/2017 8:15 AM, Michel Thierry wrote:
>
>
> On 2/24/2017 2:40 AM, Michal Wajdeczko wrote:
>> On Thu, Feb 23, 2017 at 03:11:37PM -0800, Michel Thierry wrote:
>>> There was no way to check if the platform is running the latest
>>> firmware.
>>
>> Can we also add similar patch for the HuC ?
>>
>
> Please don't tell me the HuC can hang the gpu too.
>
>>>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index 2b1d15668192..e022187916ee 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct
>>> drm_i915_error_state_buf *m,
>>>                 CSR_VERSION_MINOR(csr->version));
>>>      }
>>>
>>> +    if (HAS_GUC_UCODE(dev_priv)) {
>>> +        struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>>
>> I would preffer to use HAS_GUC and intel_guc* here.
>>
>>
>>> +
>>> +        err_printf(m, "GuC loaded: %s\n",
>>> +               yesno(guc_fw->load_status ==
>>> +                 INTEL_UC_FIRMWARE_SUCCESS));
>>
>> Hmm, as we do have more detailed load status, why limiting it to
>> yes/no only?

Will it help in the post-mortem debug?
My idea was, if the fw didn't load, we can take it completely out of the 
picture.

>> -Michal
>>
>>> +        err_printf(m, "GuC fw version: %d.%d\n",
>>> +               guc_fw->major_ver_found, guc_fw->minor_ver_found);
>>> +    }
>>> +
>>>      err_printf(m, "EIR: 0x%08x\n", error->eir);
>>>      err_printf(m, "IER: 0x%08x\n", error->ier);
>>>      for (i = 0; i < error->ngtier; i++)
>>> --
>>> 2.11.0
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 10:49       ` Chris Wilson
  2017-02-24 15:45         ` Kamble, Sagar A
@ 2017-02-24 16:30         ` Michel Thierry
  2017-02-24 17:15           ` Chris Wilson
  1 sibling, 1 reply; 13+ messages in thread
From: Michel Thierry @ 2017-02-24 16:30 UTC (permalink / raw)
  To: Chris Wilson, Michal Wajdeczko, Kamble, Sagar A, intel-gfx

On 2/24/2017 2:49 AM, Chris Wilson wrote:
> On Fri, Feb 24, 2017 at 11:43:32AM +0100, Michal Wajdeczko wrote:
>> On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
>>> On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
>>>>    Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
>>>>
>>>>    On 2/24/2017 4:41 AM, Michel Thierry wrote:
>>>>
>>>>  There was no way to check if the platform is running the latest firmware.
>>>>
>>>>  Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
>>>>  Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
>>>>  Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
>>>>  ---
>>>>   drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>>  diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>  index 2b1d15668192..e022187916ee 100644
>>>>  --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>  +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>  @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>>>>                             CSR_VERSION_MINOR(csr->version));
>>>>          }
>>>>
>>>>  +       if (HAS_GUC_UCODE(dev_priv)) {
>>>>  +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>>>>  +
>>>>  +               err_printf(m, "GuC loaded: %s\n",
>>>>  +                          yesno(guc_fw->load_status ==
>>>>  +                                INTEL_UC_FIRMWARE_SUCCESS));
>>>>  +               err_printf(m, "GuC fw version: %d.%d\n",
>>>>  +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
>>>>  +       }
>>>>  +
>>>
>>> Hmm. The firmware may change between the hang and cat
>>> /sys/class/drm/card0/error (as it will be reloaded after the reset).
>>
>> Btw, maybe we should add counter that will be incremented on each fw reload
>> and reported here ?
>
> If it occurs to you that we need it for post-mortem debugging and having
> it is worth more than any potential confusion....
>
> I can see the need for knowing what guc/huc/dmc/etc was running at the
> time of a hang - I just hope that what was previously running before an
> earlier reset doesn't contribute. But that's why we focus on the first
> error in a system...

Can the firmware change?
Last time I checked the filename was hard-coded in the driver. It's true 
that the load process could fail and then the information be incorrect.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 16:30         ` Michel Thierry
@ 2017-02-24 17:15           ` Chris Wilson
  2017-02-24 17:32             ` Michel Thierry
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2017-02-24 17:15 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Fri, Feb 24, 2017 at 08:30:43AM -0800, Michel Thierry wrote:
> On 2/24/2017 2:49 AM, Chris Wilson wrote:
> >On Fri, Feb 24, 2017 at 11:43:32AM +0100, Michal Wajdeczko wrote:
> >>On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
> >>>On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
> >>>>   Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
> >>>>
> >>>>   On 2/24/2017 4:41 AM, Michel Thierry wrote:
> >>>>
> >>>> There was no way to check if the platform is running the latest firmware.
> >>>>
> >>>> Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
> >>>> Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
> >>>> Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
> >>>> ---
> >>>>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
> >>>>  1 file changed, 10 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> >>>> index 2b1d15668192..e022187916ee 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> >>>> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >>>>                            CSR_VERSION_MINOR(csr->version));
> >>>>         }
> >>>>
> >>>> +       if (HAS_GUC_UCODE(dev_priv)) {
> >>>> +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
> >>>> +
> >>>> +               err_printf(m, "GuC loaded: %s\n",
> >>>> +                          yesno(guc_fw->load_status ==
> >>>> +                                INTEL_UC_FIRMWARE_SUCCESS));
> >>>> +               err_printf(m, "GuC fw version: %d.%d\n",
> >>>> +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
> >>>> +       }
> >>>> +
> >>>
> >>>Hmm. The firmware may change between the hang and cat
> >>>/sys/class/drm/card0/error (as it will be reloaded after the reset).
> >>
> >>Btw, maybe we should add counter that will be incremented on each fw reload
> >>and reported here ?
> >
> >If it occurs to you that we need it for post-mortem debugging and having
> >it is worth more than any potential confusion....
> >
> >I can see the need for knowing what guc/huc/dmc/etc was running at the
> >time of a hang - I just hope that what was previously running before an
> >earlier reset doesn't contribute. But that's why we focus on the first
> >error in a system...
> 
> Can the firmware change?
> Last time I checked the filename was hard-coded in the driver. It's
> true that the load process could fail and then the information be
> incorrect.

Assume it won't be hardcoded for ever (or at least no more than a week)...
And yes, the filesystem state may have changed since the previous load.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: Include GuC fw version in error state
  2017-02-24 17:15           ` Chris Wilson
@ 2017-02-24 17:32             ` Michel Thierry
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Thierry @ 2017-02-24 17:32 UTC (permalink / raw)
  To: Chris Wilson, Michal Wajdeczko, Kamble, Sagar A, intel-gfx



On 2/24/2017 9:15 AM, Chris Wilson wrote:
> On Fri, Feb 24, 2017 at 08:30:43AM -0800, Michel Thierry wrote:
>> On 2/24/2017 2:49 AM, Chris Wilson wrote:
>>> On Fri, Feb 24, 2017 at 11:43:32AM +0100, Michal Wajdeczko wrote:
>>>> On Fri, Feb 24, 2017 at 09:13:29AM +0000, Chris Wilson wrote:
>>>>> On Fri, Feb 24, 2017 at 09:13:05AM +0530, Kamble, Sagar A wrote:
>>>>>>   Reviewed-by: Sagar Arun Kamble [1]<sagar.a.kamble@intel.com>
>>>>>>
>>>>>>   On 2/24/2017 4:41 AM, Michel Thierry wrote:
>>>>>>
>>>>>> There was no way to check if the platform is running the latest firmware.
>>>>>>
>>>>>> Cc: Tvrtko Ursulin [2]<tvrtko.ursulin@intel.com>
>>>>>> Cc: Arkadiusz Hiler [3]<arkadiusz.hiler@intel.com>
>>>>>> Signed-off-by: Michel Thierry [4]<michel.thierry@intel.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++++++
>>>>>>  1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>> index 2b1d15668192..e022187916ee 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>> @@ -632,6 +632,16 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>>>>>>                            CSR_VERSION_MINOR(csr->version));
>>>>>>         }
>>>>>>
>>>>>> +       if (HAS_GUC_UCODE(dev_priv)) {
>>>>>> +               struct intel_uc_fw *guc_fw = &dev_priv->guc.fw;
>>>>>> +
>>>>>> +               err_printf(m, "GuC loaded: %s\n",
>>>>>> +                          yesno(guc_fw->load_status ==
>>>>>> +                                INTEL_UC_FIRMWARE_SUCCESS));
>>>>>> +               err_printf(m, "GuC fw version: %d.%d\n",
>>>>>> +                          guc_fw->major_ver_found, guc_fw->minor_ver_found);
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> Hmm. The firmware may change between the hang and cat
>>>>> /sys/class/drm/card0/error (as it will be reloaded after the reset).
>>>>
>>>> Btw, maybe we should add counter that will be incremented on each fw reload
>>>> and reported here ?
>>>
>>> If it occurs to you that we need it for post-mortem debugging and having
>>> it is worth more than any potential confusion....
>>>
>>> I can see the need for knowing what guc/huc/dmc/etc was running at the
>>> time of a hang - I just hope that what was previously running before an
>>> earlier reset doesn't contribute. But that's why we focus on the first
>>> error in a system...
>>
>> Can the firmware change?
>> Last time I checked the filename was hard-coded in the driver. It's
>> true that the load process could fail and then the information be
>> incorrect.
>
> Assume it won't be hardcoded for ever (or at least no more than a week)...
> And yes, the filesystem state may have changed since the previous load.

ok, I'll add an i915_capture_fw_state to collect the information before 
the reset (for dmc/guc/huc).

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-24 17:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-23 23:11 [PATCH] drm/i915: Include GuC fw version in error state Michel Thierry
2017-02-24  0:22 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-02-24  3:43 ` [PATCH] " Kamble, Sagar A
2017-02-24  9:13   ` Chris Wilson
2017-02-24 10:43     ` Michal Wajdeczko
2017-02-24 10:49       ` Chris Wilson
2017-02-24 15:45         ` Kamble, Sagar A
2017-02-24 16:30         ` Michel Thierry
2017-02-24 17:15           ` Chris Wilson
2017-02-24 17:32             ` Michel Thierry
2017-02-24 10:40 ` Michal Wajdeczko
2017-02-24 16:15   ` Michel Thierry
2017-02-24 16:21     ` Michel Thierry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox