Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "K V P, Satyanarayana" <satyanarayana.k.v.p@intel.com>
To: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: "Michał Wajdeczko" <michal.wajdeczko@intel.com>,
	"Michał Winiarski" <michal.winiarski@intel.com>
Subject: Re: [PATCH] drm/xe/pf: Add tracepoints for VF migration state and data transfer
Date: Thu, 27 Nov 2025 16:30:16 +0530	[thread overview]
Message-ID: <fcce0fd8-330e-4a69-9d11-9bbd85cf04e1@intel.com> (raw)
In-Reply-To: <20251127104131.591299-1-marcin.bernatowicz@linux.intel.com>


On 27-Nov-25 4:11 PM, Marcin Bernatowicz wrote:
> VF migration currently provides no lightweight visibility into its internal
> save/restore flow, making it difficult to measure migration latency, VF
> stun time, or the amount of data transferred per VF and per data type
> (VRAM/GGTT/MMIO/GuC).
>
> This patch introduces two tracepoints that instrument the VF state machine
> and each migration-data producer:
>
>    * xe_sriov_vf_state
>        - emits the tile/gt/vf identifier and the effective VF state
>        - placed on SAVE_WIP, SAVED, RESTORE_WIP and RESTORED transitions
>        - allows tracking the duration of individual save/restore phases
>          for each VF
>
>    * xe_sriov_vf_mig_data_chunk
>        - records the type of migrated data (VRAM/GGTT/MMIO/GuC), direction
>          (save/restore), offset and size in bytes
>        - emitted once per chunk after it is produced/consumed
>        - enables reconstruction of migration throughput and per-type byte
>          distribution
>
> These tracepoints are low-overhead when disabled and integrate with
> existing tooling such as trace-cmd and perf. They make it possible to
> correlate VF state transitions with data movement timings and to profile
> migration behaviour under various workloads or provisioning scenarios.

These traces are useful to see where we are spending much time during 
migration.

Can we add some debugfs entries in PF which shows consolidated save and 
restore times for each component?

>
> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c   |  8 +++
>   drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c | 49 +++++++++++++++++++
>   drivers/gpu/drm/xe/xe_trace.h                 | 47 ++++++++++++++++++
>   3 files changed, 104 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c
> index bf48b05797de..de327ef14218 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c
> @@ -24,6 +24,7 @@
>   #include "xe_sriov_pf_migration.h"
>   #include "xe_sriov_pf_service.h"
>   #include "xe_tile.h"
> +#include "xe_trace.h"
>   
>   static const char *control_cmd_to_string(u32 cmd)
>   {
> @@ -836,6 +837,8 @@ static void pf_enter_vf_saved(struct xe_gt *gt, unsigned int vfid)
>   		pf_enter_vf_state_machine_bug(gt, vfid);
>   
>   	xe_gt_sriov_dbg(gt, "VF%u saved!\n", vfid);
> +	trace_xe_sriov_vf_state(gt, vfid,
> +				XE_GT_SRIOV_STATE_SAVED);
>   
>   	pf_expect_vf_state(gt, vfid, XE_GT_SRIOV_STATE_PAUSED);
>   	pf_exit_vf_mismatch(gt, vfid);
> @@ -944,6 +947,7 @@ static void pf_exit_vf_save_wait_data(struct xe_gt *gt, unsigned int vfid)
>   static bool pf_enter_vf_save_wip(struct xe_gt *gt, unsigned int vfid)
>   {
>   	if (pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_SAVE_WIP)) {
> +		trace_xe_sriov_vf_state(gt, vfid, XE_GT_SRIOV_STATE_SAVE_WIP);
>   		xe_gt_sriov_pf_migration_save_init(gt, vfid);
>   		pf_enter_vf_wip(gt, vfid);
>   		pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_SAVE_PROCESS_DATA);
> @@ -1113,6 +1117,8 @@ static void pf_enter_vf_restored(struct xe_gt *gt, unsigned int vfid)
>   		pf_enter_vf_state_machine_bug(gt, vfid);
>   
>   	xe_gt_sriov_dbg(gt, "VF%u restored!\n", vfid);
> +	trace_xe_sriov_vf_state(gt, vfid,
> +				XE_GT_SRIOV_STATE_RESTORED);
>   
>   	pf_expect_vf_state(gt, vfid, XE_GT_SRIOV_STATE_PAUSED);
>   	pf_exit_vf_mismatch(gt, vfid);
> @@ -1195,6 +1201,8 @@ static void pf_exit_vf_restore_wait_data(struct xe_gt *gt, unsigned int vfid)
>   static bool pf_enter_vf_restore_wip(struct xe_gt *gt, unsigned int vfid)
>   {
>   	if (pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_RESTORE_WIP)) {
> +		trace_xe_sriov_vf_state(gt, vfid,
> +					XE_GT_SRIOV_STATE_RESTORE_WIP);
>   		pf_enter_vf_wip(gt, vfid);
>   		pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_RESTORE_PROCESS_DATA);
>   		pf_queue_vf(gt, vfid);
> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> index d5d918ddce4f..0e8f5e8e5d7a 100644
> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
> @@ -25,6 +25,7 @@
>   #include "xe_sriov_packet.h"
>   #include "xe_sriov_packet_types.h"
>   #include "xe_sriov_pf_migration.h"
> +#include "xe_trace.h"
>   
>   #define XE_GT_SRIOV_PF_MIGRATION_RING_SIZE 5
>   
> @@ -86,6 +87,12 @@ static int pf_save_vf_ggtt_mig_data(struct xe_gt *gt, unsigned int vfid)
>   	if (ret)
>   		goto fail;
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_GGTT,
> +					 true,   /* is_save */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;

We normally give zero for save and one for restore.
Can we define some macros and use them instead of true and false for better readability?
-Satya

>   fail:
> @@ -108,6 +115,12 @@ static int pf_restore_vf_ggtt_mig_data(struct xe_gt *gt, unsigned int vfid,
>   		return ret;
>   	}
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_GGTT,
> +					 false, /* restore */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   }
>   
> @@ -274,6 +287,12 @@ static int pf_save_vf_guc_mig_data(struct xe_gt *gt, unsigned int vfid)
>   	if (ret)
>   		goto fail_free;
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_GUC,
> +					 true, /* is_save */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   
>   fail_free:
> @@ -332,6 +351,12 @@ static int pf_restore_vf_guc_state(struct xe_gt *gt, unsigned int vfid,
>   	if (ret < 0)
>   		goto fail;
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_GUC,
> +					 false, /* restore */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   
>   fail:
> @@ -442,6 +467,12 @@ static int pf_save_vf_mmio_mig_data(struct xe_gt *gt, unsigned int vfid)
>   	if (ret)
>   		goto fail;
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_MMIO,
> +					 true,   /* is_save */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   
>   fail:
> @@ -465,6 +496,12 @@ static int pf_restore_vf_mmio_mig_data(struct xe_gt *gt, unsigned int vfid,
>   		return ret;
>   	}
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_MMIO,
> +					 false, /* restore */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   }
>   
> @@ -589,6 +626,12 @@ static int pf_save_vram_chunk(struct xe_gt *gt, unsigned int vfid,
>   	if (ret)
>   		goto fail;
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_VRAM,
> +					 true,  /* is_save */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   
>   fail:
> @@ -672,6 +715,12 @@ static int pf_restore_vf_vram_mig_data(struct xe_gt *gt, unsigned int vfid,
>   
>   	xe_bo_put(vram);
>   
> +	trace_xe_sriov_vf_mig_data_chunk(gt, vfid,
> +					 XE_SRIOV_PACKET_TYPE_VRAM,
> +					 false, /* restore */
> +					 data->hdr.offset,
> +					 data->hdr.size);
> +
>   	return 0;
>   err:
>   	xe_bo_put(vram);
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 79a97b086cb2..0e714b754179 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -464,6 +464,53 @@ TRACE_EVENT(xe_exec_queue_reach_max_job_count,
>   		      __entry->class, __entry->logical_mask, __entry->guc_id)
>   );
>   
> +TRACE_EVENT(xe_sriov_vf_state,
> +	    TP_PROTO(const struct xe_gt *gt, u16 vfid, u32 vf_state),
> +	    TP_ARGS(gt, vfid, vf_state),
> +	    TP_STRUCT__entry(__field(u8,  tile_id)
> +			     __field(u8,  gt_id)
> +			     __field(u16, vfid)
> +			     __field(u32, vf_state)
> +			    ),
> +	    TP_fast_assign(__entry->tile_id  = gt->tile->id;
> +			   __entry->gt_id    = gt->info.id;
> +			   __entry->vfid     = vfid;
> +			   __entry->vf_state = vf_state;
> +			  ),
> +	    TP_printk("tile=%u gt=%u vf=%u state=0x%x",
> +		      __entry->tile_id, __entry->gt_id,
> +		      __entry->vfid, __entry->vf_state)
> +);
> +
> +TRACE_EVENT(xe_sriov_vf_mig_data_chunk,
> +	    TP_PROTO(const struct xe_gt *gt, u16 vfid,
> +		     u8 data_type, bool is_save,
> +		     u64 offset, u64 bytes),
> +	    TP_ARGS(gt, vfid, data_type, is_save, offset, bytes),
> +	    TP_STRUCT__entry(__field(u8,  tile_id)
> +			     __field(u8,  gt_id)
> +			     __field(u16, vfid)
> +			     __field(u8,  data_type)
> +			     __field(u8,  is_save)
> +			     __field(u64, offset)
> +			     __field(u64, bytes)
> +			    ),
> +	    TP_fast_assign(__entry->tile_id   = gt->tile->id;
> +			   __entry->gt_id     = gt->info.id;
> +			   __entry->vfid      = vfid;
> +			   __entry->data_type = data_type;
> +			   __entry->is_save   = is_save;
> +			   __entry->offset    = offset;
> +			   __entry->bytes     = bytes;
> +			  ),
> +	    TP_printk("tile=%u gt=%u vf=%u type=%u %s off=%llu size=%llu",
> +		      __entry->tile_id, __entry->gt_id, __entry->vfid,
> +		      __entry->data_type,
> +		      __entry->is_save ? "save" : "restore",
> +		      (unsigned long long)__entry->offset,
> +		      (unsigned long long)__entry->bytes)
> +);
> +
>   #endif
>   
>   /* This part must be outside protection */

  reply	other threads:[~2025-11-27 11:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-27 10:41 [PATCH] drm/xe/pf: Add tracepoints for VF migration state and data transfer Marcin Bernatowicz
2025-11-27 11:00 ` K V P, Satyanarayana [this message]
2025-12-01 12:59   ` Michał Winiarski
2025-11-27 12:37 ` ✓ CI.KUnit: success for " Patchwork
2025-11-27 13:12 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-27 14:39 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcce0fd8-330e-4a69-9d11-9bbd85cf04e1@intel.com \
    --to=satyanarayana.k.v.p@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=marcin.bernatowicz@linux.intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=michal.winiarski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox