Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
To: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-xe] [PATCH v9 3/3] drm/xe: Introduce fault injection for gt reset
Date: Mon, 31 Jul 2023 20:00:01 +0530	[thread overview]
Message-ID: <ZMfFaXW7607Cy45j@bvivekan-mobl> (raw)
In-Reply-To: <20230726232650.3873897-4-himal.prasad.ghimiray@intel.com>

On 27.07.2023 04:56, Himal Prasad Ghimiray wrote:
> To trigger gt reset failure:
>  echo 100 >  /sys/kernel/debug/dri/<cardX>/fail_gt_reset/probability
>  echo 2 >  /sys/kernel/debug/dri/<cardX>/fail_gt_reset/times
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> 
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_debugfs.c | 10 ++++++++++
>  drivers/gpu/drm/xe/xe_gt.c      |  8 +++++++-
>  drivers/gpu/drm/xe/xe_gt.h      | 14 ++++++++++++++
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 047341d5689a..1fd016e6f7a0 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -5,6 +5,7 @@
>  
>  #include "xe_debugfs.h"
>  
> +#include <linux/fault-inject.h>
>  #include <linux/string_helpers.h>
>  
>  #include <drm/drm_debugfs.h>
> @@ -20,6 +21,10 @@
>  #include "xe_vm.h"
>  #endif
>  
> +#ifdef CONFIG_FAULT_INJECTION
> +DECLARE_FAULT_ATTR(gt_reset_failure);
> +#endif
> +
>  static struct xe_device *node_to_xe(struct drm_info_node *node)
>  {
>  	return to_xe_device(node->minor->dev);
> @@ -135,4 +140,9 @@ void xe_debugfs_register(struct xe_device *xe)
>  
>  	for_each_gt(gt, xe, id)
>  		xe_gt_debugfs_register(gt);
> +
> +#ifdef CONFIG_FAULT_INJECTION
> +	fault_create_debugfs_attr("fail_gt_reset", root, &gt_reset_failure);
> +#endif
> +
>  }
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 5e70e486b27c..691e3baf97c9 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -525,6 +525,11 @@ static int gt_reset(struct xe_gt *gt)
>  
>  	xe_gt_info(gt, "reset started\n");
>  
> +	if (xe_fault_inject_gt_reset()) {
> +		err = -ECANCELED;
> +		goto err_fail;
> +	}
> +
>  	xe_gt_sanitize(gt);
>  
>  	xe_device_mem_access_get(gt_to_xe(gt));
> @@ -563,6 +568,7 @@ static int gt_reset(struct xe_gt *gt)
>  err_msg:
>  	XE_WARN_ON(xe_uc_start(&gt->uc));
>  	xe_device_mem_access_put(gt_to_xe(gt));
> +err_fail:
>  	xe_gt_err(gt, "reset failed (%pe)\n", ERR_PTR(err));
>  
>  	/* Notify userspace about gt reset failure */
> @@ -584,7 +590,7 @@ void xe_gt_reset_async(struct xe_gt *gt)
>  	xe_gt_info(gt, "trying reset\n");
>  
>  	/* Don't do a reset while one is already in flight */
> -	if (xe_uc_reset_prepare(&gt->uc))
> +	if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(&gt->uc))

When `fail_gt_reset/probability` is set to a less than 100 value, then
xe_fault_inject_gt_reset() will not always return true. So if the
xe_fault_inject_gt_reset() returns differenet values when invoked from
xe_gt_reset_async() and gt_reset(), we will have unexpected behaviour.

We should avoid calling xe_fault_inject_gt_reset() more than once in a
single reset cycle.
We could exit immediately in xe_gt_reset_async() if fault injection is
enabled.

Regards,
Bala
>  		return;
>  
>  	xe_gt_info(gt, "reset queued\n");
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index 7298653a73de..caded203a8a0 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -7,6 +7,7 @@
>  #define _XE_GT_H_
>  
>  #include <drm/drm_util.h>
> +#include <linux/fault-inject.h>
>  
>  #include "xe_device_types.h"
>  #include "xe_hw_engine.h"
> @@ -16,6 +17,19 @@
>  		for_each_if(((hwe__) = (gt__)->hw_engines + (id__)) && \
>  			  xe_hw_engine_is_valid((hwe__)))
>  
> +#ifdef CONFIG_FAULT_INJECTION
> +extern struct fault_attr gt_reset_failure;
> +static inline bool xe_fault_inject_gt_reset(void)
> +{
> +	return should_fail(&gt_reset_failure, 1);
> +}
> +#else
> +static inline bool xe_fault_inject_gt_reset(void)
> +{
> +	return false;
> +}
> +#endif
> +
>  struct xe_gt *xe_gt_alloc(struct xe_tile *tile);
>  int xe_gt_init_early(struct xe_gt *gt);
>  int xe_gt_init(struct xe_gt *gt);
> -- 
> 2.25.1
> 

  reply	other threads:[~2023-07-31 14:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26 23:26 [Intel-xe] [PATCH v9 0/3] Notify userspace about uevent failure Himal Prasad Ghimiray
2023-07-26 23:26 ` [Intel-xe] [PATCH v9 1/3] fault-inject: Include linux/types.h by default Himal Prasad Ghimiray
2023-07-28 16:49   ` Rodrigo Vivi
2023-07-26 23:26 ` [Intel-xe] [PATCH v9 2/3] drm/xe: Notify Userspace when gt reset fails Himal Prasad Ghimiray
2023-07-26 23:26 ` [Intel-xe] [PATCH v9 3/3] drm/xe: Introduce fault injection for gt reset Himal Prasad Ghimiray
2023-07-31 14:30   ` Balasubramani Vivekanandan [this message]
2023-07-31 15:24     ` Ghimiray, Himal Prasad
2023-08-01  6:02       ` Balasubramani Vivekanandan
2023-08-01  6:45         ` Ghimiray, Himal Prasad
2023-08-01  8:03           ` Balasubramani Vivekanandan
2023-08-01  8:04             ` Ghimiray, Himal Prasad
2023-07-27  0:29 ` [Intel-xe] ✓ CI.Patch_applied: success for Notify userspace about uevent failure Patchwork
2023-07-27  0:29 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-07-27  0:30 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-07-27  0:34 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-27  0:35 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-27  0:36 ` [Intel-xe] ✗ CI.checksparse: warning " Patchwork
2023-07-27  1:10 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMfFaXW7607Cy45j@bvivekan-mobl \
    --to=balasubramani.vivekanandan@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox