From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AB81DCCD19B for ; Tue, 7 Oct 2025 13:05:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7FD3610E6AD; Tue, 7 Oct 2025 13:05:18 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="kNEMCgF3"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3AD2210E674 for ; Tue, 7 Oct 2025 13:05:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759842316; x=1791378316; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=YxbVLam+gg4WIBZWsH+O0Pw3+0fRRdNTaQpNKJ7k4yY=; b=kNEMCgF3NL2cQrf+VUTZmmSuJjcaUiOSvDplrxOrL+eYm17CcV4UzFWl VcmwyqUTMcGTdSO/nJrNK/UNQyY6QNzmtzdgUPdaxU7lXU5Q5yKrMl/rt Zd02TmRiwzM4/NUqodwZyhgqHs50WwKib9TUadXztcfIWQZ9GLv/l1hgG Jv5xoW9gf4RY7tBjmtnAbLzDLmhUQkuRl18q0QXO9CU3XW4fa+1+xHUJA r1MuGFqP0UlN8gufpVpqFS98jR+ita246mxbIwF0huKA8OYEbIVodkCyL lhh5PVV/ExVY7O3S7wpswVB3b1E+Rqw4hljLB8LWLNm7Xj4rEKjG1S3On g==; X-CSE-ConnectionGUID: GuHl65XZQUKHJD2DFgKuvA== X-CSE-MsgGUID: vfwomIM7RlGm52JvbpgtcQ== X-IronPort-AV: E=McAfee;i="6800,10657,11575"; a="64639852" X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="64639852" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 06:05:16 -0700 X-CSE-ConnectionGUID: sVU5yAsOQ8ehH19/mFdD7A== X-CSE-MsgGUID: 9Hasb+D/R3mDXZyLq2PbGg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="180576967" Received: from lstrano-desk.jf.intel.com ([10.54.39.91]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 06:05:16 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org Subject: [PATCH v8 31/33] drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTL Date: Tue, 7 Oct 2025 06:05:03 -0700 Message-Id: <20251007130505.2694829-32-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251007130505.2694829-1-matthew.brost@intel.com> References: <20251007130505.2694829-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" It is possible that the media GT's VF post-migration recovery work item gets scheduled before the primary GT's work item. Since the media GT depends on the primary GT's work item to complete CCS restore, if the media GT's work item is scheduled first, detect this condition and re-queue the media GT's work item for a later time. v5: - Adjust debug message (Tomasz) Signed-off-by: Matthew Brost Reviewed-by: Tomasz Lis --- drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c index dc589cf6ec98..bc6e1729b77a 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c @@ -1102,8 +1102,22 @@ void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p) pf_version->major, pf_version->minor); } -static void vf_post_migration_shutdown(struct xe_gt *gt) +static bool vf_post_migration_shutdown(struct xe_gt *gt) { + struct xe_device *xe = gt_to_xe(gt); + + /* + * On platforms where CCS must be restored by the primary GT, the media + * GT's VF post-migration recovery must run afterward. Detect this case + * and re-queue the media GT's restore work item if necessary. + */ + if (xe->info.needs_shared_vf_gt_wq && xe_gt_is_media_type(gt)) { + struct xe_gt *primary_gt = gt_to_tile(gt)->primary_gt; + + if (xe_gt_sriov_vf_recovery_pending(primary_gt)) + return true; + } + spin_lock_irq(>->sriov.vf.migration.lock); gt->sriov.vf.migration.recovery_queued = false; spin_unlock_irq(>->sriov.vf.migration.lock); @@ -1111,6 +1125,8 @@ static void vf_post_migration_shutdown(struct xe_gt *gt) xe_guc_ct_flush_and_stop(>->uc.guc.ct); xe_guc_submit_pause(>->uc.guc); xe_tlb_inval_reset(>->tlb_inval); + + return false; } static size_t post_migration_scratch_size(struct xe_device *xe) @@ -1188,11 +1204,14 @@ static void vf_post_migration_recovery(struct xe_gt *gt) { struct xe_device *xe = gt_to_xe(gt); int err; + bool retry; xe_gt_sriov_dbg(gt, "migration recovery in progress\n"); xe_pm_runtime_get(xe); - vf_post_migration_shutdown(gt); + retry = vf_post_migration_shutdown(gt); + if (retry) + goto queue; if (!xe_sriov_vf_migration_supported(xe)) { xe_gt_sriov_err(gt, "migration is not supported\n"); @@ -1220,6 +1239,12 @@ static void vf_post_migration_recovery(struct xe_gt *gt) xe_pm_runtime_put(xe); xe_gt_sriov_err(gt, "migration recovery failed (%pe)\n", ERR_PTR(err)); xe_device_declare_wedged(xe); + return; + +queue: + xe_gt_sriov_info(gt, "Re-queuing migration recovery\n"); + queue_work(gt->ordered_wq, >->sriov.vf.migration.worker); + xe_pm_runtime_put(xe); } static void migration_worker_func(struct work_struct *w) -- 2.34.1