Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
	<intel-xe@lists.freedesktop.org>,
	Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>,
	Tomasz Lis <tomasz.lis@intel.com>
Subject: Re: [PATCH v2 2/2] drm/xe/vf: Use fault injection for testing VF double migration feature
Date: Fri, 24 Oct 2025 13:52:28 +0200	[thread overview]
Message-ID: <a5948b67-4145-46f6-b83d-3831d50ba879@intel.com> (raw)
In-Reply-To: <20251023153616.3790-6-satyanarayana.k.v.p@intel.com>



On 10/23/2025 5:36 PM, Satyanarayana K V P wrote:
> The VF migration process sends a marker to the GUC before starting
> resource fixups, and sends the same marker with the RESFIX_DONE
> notification. This prevents the GUC from submitting jobs to hardware
> during double migration events.
> 
> Testing double migration requires triggering a second migration while the
> first migration's fixups are in progress. Since fixups complete quickly,
> this scenario is difficult to reproduce reliably. Use fault injection
> framework to add a 10-second delay in xe_should_delay_vf_post_fixups()
> during the post-fixup phase, creating a reliable testing window for
> triggering subsequent migrations.

I'm not sure that we should abuse fault-injection framework for this

can't we simply expose some debugfs entries that would point to the delays we want to insert at specific places?

this will allow to pass different delay values, without updating the code
and will actually require less code, with no tricks, to implement:

gt_types.h:

	// gt->sriov.vf.migration.debug.resfix_done_delay
	// gt->sriov.vf.migration.debug.resfix_start_delay

gt_debugfs.c:

	if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) {
		debugfs_create_ulong("resfix_done_delay ", ..., resfix_done_delay );
		debugfs_create_ulong("resfix_start_delay ", ..., resfix_start_delay);
		...
	}

gt_sriov_vf.c:

	if (resfix_done_delay)
		msleep(resfix_done_delay);

and in the future case, when we would need/want to delay some actions during the probe, ie. before debugfs is available, 
we can just add configfs entry with default delays to be used by the driver to allow test corner cases:

xe_configfs.c:

	/sys/config/xe/BDF/debug/default_delay

	int xe_config_default_delay(xe) { ... }

gt_sriov-vf.c:

	gt->sriov.vf.migration.debug.resfix_done_delay = xe_config_default_delay(xe);
	gt->sriov.vf.migration.debug.resfix_start_delay = xe_config_default_delay(xe);


@Lucas ?


> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Tomasz Lis <tomasz.lis@intel.com>
> 
> ---
> V1 -> V2:
> - New commit
> ---
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> index 8c1448d6c81d..63d43553ae4f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
> @@ -5,6 +5,7 @@
>  
>  #include <linux/bitfield.h>
>  #include <linux/bsearch.h>
> +#include <linux/delay.h>
>  
>  #include <drm/drm_managed.h>
>  #include <drm/drm_print.h>
> @@ -1183,6 +1184,30 @@ static size_t post_migration_scratch_size(struct xe_device *xe)
>  	return max(xe_lrc_reg_size(xe), LRC_WA_BB_SIZE);
>  }
>  
> +#if defined(CONFIG_DRM_XE_DEBUG) && defined(CONFIG_FUNCTION_ERROR_INJECTION)
> +static noinline int xe_should_delay_vf_post_fixups(void)
> +{
> +	return 0;
> +}
> +ALLOW_ERROR_INJECTION(xe_should_delay_vf_post_fixups, ERRNO);
> +
> +static void vf_post_migration_fixup_delay(struct xe_gt *gt)
> +{
> +	int err = xe_should_delay_vf_post_fixups();
> +	unsigned long delay = 10 * USEC_PER_SEC;
> +
> +	if (err == -ETIME) {
> +		xe_gt_sriov_dbg(gt, "Delaying fixups by %ld secs\n",
> +				delay / USEC_PER_SEC);
> +		fsleep(delay);
> +	} else {
> +		return;
> +	}
> +}
> +#else
> +static inline void vf_post_migration_fixup_delay(struct xe_gt *gt) { }
> +#endif
> +
>  static int vf_post_migration_fixups(struct xe_gt *gt)
>  {
>  	void *buf = gt->sriov.vf.migration.scratch;
> @@ -1196,6 +1221,8 @@ static int vf_post_migration_fixups(struct xe_gt *gt)
>  	if (xe_gt_is_main_type(gt))
>  		xe_sriov_vf_ccs_rebase(gt_to_xe(gt));
>  
> +	vf_post_migration_fixup_delay(gt);
> +
>  	xe_gt_sriov_vf_default_lrcs_hwsp_rebase(gt);
>  	err = xe_guc_contexts_hwsp_rebase(&gt->uc.guc, buf);
>  	if (err)
> @@ -1304,6 +1331,8 @@ static void vf_post_migration_recovery(struct xe_gt *gt)
>  	if (err)
>  		goto fail;
>  
> +	vf_post_migration_fixup_delay(gt);
> +
>  	vf_post_migration_rearm(gt);
>  
>  	err = vf_post_migration_notify_resfix_done(gt, marker);


  reply	other threads:[~2025-10-24 11:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-23 15:36 [PATCH v2 0/2] VF double migration Satyanarayana K V P
2025-10-23 15:36 ` [PATCH v2 1/2] drm/xe/vf: Introduce RESFIX start marker support Satyanarayana K V P
2025-10-23 20:37   ` Matthew Brost
2025-10-23 20:54     ` Matthew Brost
2025-10-31 20:10     ` Matthew Brost
2025-10-23 23:33   ` Michal Wajdeczko
2025-10-23 15:36 ` [PATCH v2 2/2] drm/xe/vf: Use fault injection for testing VF double migration feature Satyanarayana K V P
2025-10-24 11:52   ` Michal Wajdeczko [this message]
2025-10-23 17:18 ` ✓ CI.KUnit: success for VF double migration (rev2) Patchwork
2025-10-23 18:17 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-24  5:23 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5948b67-4145-46f6-b83d-3831d50ba879@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    --cc=tomasz.lis@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox