All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
To: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-xe] [PATCH v9 3/3] drm/xe: Introduce fault injection for gt reset
Date: Mon, 31 Jul 2023 20:00:01 +0530	[thread overview]
Message-ID: <ZMfFaXW7607Cy45j@bvivekan-mobl> (raw)
In-Reply-To: <20230726232650.3873897-4-himal.prasad.ghimiray@intel.com>

On 27.07.2023 04:56, Himal Prasad Ghimiray wrote:
> To trigger gt reset failure:
>  echo 100 >  /sys/kernel/debug/dri/<cardX>/fail_gt_reset/probability
>  echo 2 >  /sys/kernel/debug/dri/<cardX>/fail_gt_reset/times
> 
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> 
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_debugfs.c | 10 ++++++++++
>  drivers/gpu/drm/xe/xe_gt.c      |  8 +++++++-
>  drivers/gpu/drm/xe/xe_gt.h      | 14 ++++++++++++++
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 047341d5689a..1fd016e6f7a0 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -5,6 +5,7 @@
>  
>  #include "xe_debugfs.h"
>  
> +#include <linux/fault-inject.h>
>  #include <linux/string_helpers.h>
>  
>  #include <drm/drm_debugfs.h>
> @@ -20,6 +21,10 @@
>  #include "xe_vm.h"
>  #endif
>  
> +#ifdef CONFIG_FAULT_INJECTION
> +DECLARE_FAULT_ATTR(gt_reset_failure);
> +#endif
> +
>  static struct xe_device *node_to_xe(struct drm_info_node *node)
>  {
>  	return to_xe_device(node->minor->dev);
> @@ -135,4 +140,9 @@ void xe_debugfs_register(struct xe_device *xe)
>  
>  	for_each_gt(gt, xe, id)
>  		xe_gt_debugfs_register(gt);
> +
> +#ifdef CONFIG_FAULT_INJECTION
> +	fault_create_debugfs_attr("fail_gt_reset", root, &gt_reset_failure);
> +#endif
> +
>  }
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 5e70e486b27c..691e3baf97c9 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -525,6 +525,11 @@ static int gt_reset(struct xe_gt *gt)
>  
>  	xe_gt_info(gt, "reset started\n");
>  
> +	if (xe_fault_inject_gt_reset()) {
> +		err = -ECANCELED;
> +		goto err_fail;
> +	}
> +
>  	xe_gt_sanitize(gt);
>  
>  	xe_device_mem_access_get(gt_to_xe(gt));
> @@ -563,6 +568,7 @@ static int gt_reset(struct xe_gt *gt)
>  err_msg:
>  	XE_WARN_ON(xe_uc_start(&gt->uc));
>  	xe_device_mem_access_put(gt_to_xe(gt));
> +err_fail:
>  	xe_gt_err(gt, "reset failed (%pe)\n", ERR_PTR(err));
>  
>  	/* Notify userspace about gt reset failure */
> @@ -584,7 +590,7 @@ void xe_gt_reset_async(struct xe_gt *gt)
>  	xe_gt_info(gt, "trying reset\n");
>  
>  	/* Don't do a reset while one is already in flight */
> -	if (xe_uc_reset_prepare(&gt->uc))
> +	if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(&gt->uc))

When `fail_gt_reset/probability` is set to a less than 100 value, then
xe_fault_inject_gt_reset() will not always return true. So if the
xe_fault_inject_gt_reset() returns differenet values when invoked from
xe_gt_reset_async() and gt_reset(), we will have unexpected behaviour.

We should avoid calling xe_fault_inject_gt_reset() more than once in a
single reset cycle.
We could exit immediately in xe_gt_reset_async() if fault injection is
enabled.

Regards,
Bala
>  		return;
>  
>  	xe_gt_info(gt, "reset queued\n");
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index 7298653a73de..caded203a8a0 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -7,6 +7,7 @@
>  #define _XE_GT_H_
>  
>  #include <drm/drm_util.h>
> +#include <linux/fault-inject.h>
>  
>  #include "xe_device_types.h"
>  #include "xe_hw_engine.h"
> @@ -16,6 +17,19 @@
>  		for_each_if(((hwe__) = (gt__)->hw_engines + (id__)) && \
>  			  xe_hw_engine_is_valid((hwe__)))
>  
> +#ifdef CONFIG_FAULT_INJECTION
> +extern struct fault_attr gt_reset_failure;
> +static inline bool xe_fault_inject_gt_reset(void)
> +{
> +	return should_fail(&gt_reset_failure, 1);
> +}
> +#else
> +static inline bool xe_fault_inject_gt_reset(void)
> +{
> +	return false;
> +}
> +#endif
> +
>  struct xe_gt *xe_gt_alloc(struct xe_tile *tile);
>  int xe_gt_init_early(struct xe_gt *gt);
>  int xe_gt_init(struct xe_gt *gt);
> -- 
> 2.25.1
> 

  reply	other threads:[~2023-07-31 14:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26 23:26 [Intel-xe] [PATCH v9 0/3] Notify userspace about uevent failure Himal Prasad Ghimiray
2023-07-26 23:26 ` [Intel-xe] [PATCH v9 1/3] fault-inject: Include linux/types.h by default Himal Prasad Ghimiray
2023-07-28 16:49   ` Rodrigo Vivi
2023-07-26 23:26 ` [Intel-xe] [PATCH v9 2/3] drm/xe: Notify Userspace when gt reset fails Himal Prasad Ghimiray
2023-07-26 23:26 ` [Intel-xe] [PATCH v9 3/3] drm/xe: Introduce fault injection for gt reset Himal Prasad Ghimiray
2023-07-31 14:30   ` Balasubramani Vivekanandan [this message]
2023-07-31 15:24     ` Ghimiray, Himal Prasad
2023-08-01  6:02       ` Balasubramani Vivekanandan
2023-08-01  6:45         ` Ghimiray, Himal Prasad
2023-08-01  8:03           ` Balasubramani Vivekanandan
2023-08-01  8:04             ` Ghimiray, Himal Prasad
2023-07-27  0:29 ` [Intel-xe] ✓ CI.Patch_applied: success for Notify userspace about uevent failure Patchwork
2023-07-27  0:29 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-07-27  0:30 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-07-27  0:34 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-27  0:35 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-27  0:36 ` [Intel-xe] ✗ CI.checksparse: warning " Patchwork
2023-07-27  1:10 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMfFaXW7607Cy45j@bvivekan-mobl \
    --to=balasubramani.vivekanandan@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.